filling existing dataframe with rows from loop

Question

I tried going over How to build and fill pandas dataframe from for loop? but cant seem to write my values to my columns.

Ultimately I am getting data from a webpage and want to put it into a dataframe.

my headers are predefined as:

d1 = pd.DataFrame(columns=['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7', 'col8', 'col9',
        'col10', 'col11', 'col12', 'col13', 'col14', 'col15', 'col16', 'col17'])

now I have values I get in a for loop, how can I write these rows to each column then repeat back to column 1 to 17 and then next row?

row = soup.find_all('td', attrs = {'class': 'Table__TD'})
for data in row:
    print(data.get_text())

sample output row 1

Sample output row 2

Wed 11/13
@CHA
W119-117
32
1-5
20.0
1-5
20.0
0-0
0.0
3
1
0
1
3
3
3

Expected output

Any help would be appreciated.

I’ll second what @Vishnudev said. We need more information, about the code, where the data comes from, etc. See: minimal reproducible example. — AMC
– AMC, Commented Nov 13, 2019 at 4:15
d1.loc[len(d1), col_name] = value but using a loop to put values in a dataframe sounds really bad. I suggest you post a new question with your bigger issue, of entering data into dataframe, so people can see if it can be done without a loop at all. — Aryerez
– Aryerez, Commented Nov 13, 2019 at 8:29

E. Zeytinci · Accepted Answer · 2019-11-17 18:05:27Z

1

+50

You can try this,

import pandas as pd

columns = [
    'col1',
    'col2',
    'col3',
    'col4',
    'col5',
    'col6',
    'col7',
    'col8',
    'col9',
    'col10',
    'col11',
    'col12',
    'col13',
    'col14',
    'col15',
    'col16',
    'col17',
]

# create dataframe
d1 = pd.DataFrame(columns=columns)

full = []

for data in soup.find_all('td', attrs={'class': 'Table__TD'}):
    full.append(data.get_text())

# seperate full list into sub-lists with 17 elements
rows = [full[i: i+17] for i in range(0, len(full), 17)]

# append list of lists structure to dataframe
d1 = d1.append(pd.DataFrame(rows, columns=d1.columns))

edited Nov 17, 2019 at 18:05

answered Nov 15, 2019 at 23:04

E. Zeytinci

2,6332 gold badges23 silver badges40 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

excelguy Over a year ago

Im getting undefined named 'soups` in my ide. but when I run the code im getting ValueError: Length mismatch: Expected axis has 0 elements, new values have 17 elements

E. Zeytinci Over a year ago

I edited my answer. Can you still share soup with me please?

E. Zeytinci Over a year ago

I wrote soups for the possibility that you could have more than one soup. That's what I mean by iteration.

excelguy Over a year ago

hey, i dont have a soups, but i have this, row = soup.find_all('td', attrs = {'class': 'Table__TD'}) for data in row: print(data.get_text()) this gets my data I want to append to each column. Does this help?

E. Zeytinci Over a year ago

I updated my answer again. Can you check this again please?

|

Hongpei · Accepted Answer · 2019-11-15 14:13:30Z

1

First we have list for column names:

cols = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7', 'col8', 'col9',
        'col10', 'col11', 'col12', 'col13', 'col14', 'col15', 'col16', 'col17']

Then list for values:

row = [x.get_text() for x in soup.find_all('td', attrs = {'class': 'Table__TD'})]
print(row)
# ['Mon 11/11', 'SA', '100', '31', '3-5', '60.0', '1-3', '33.3', '1-2', '50.0', '10', '4', '0', '1', '1', '2', '8']

Then we can zip the columns and the values together, then append to the dataframe:

d1 = d1.append(dict(zip(cols, row)), ignore_index=True)
print(d1)
#         col1 col2 col3 col4 col5  col6 col7  col8 col9 col10 col11 col12  \
# 0  Mon 11/11   SA  100   31  3-5  60.0  1-3  33.3  1-2  50.0    10     4   
# 
#   col13 col14 col15 col16 col17  
# 0     0     1     1     2     8

answered Nov 15, 2019 at 14:13

Hongpei

6973 silver badges13 bronze badges

1 Comment

excelguy Over a year ago

This may be a good option as my headers will remain static. Issue: how can I iterate through my beautiful soup, as if I keep running my code it appends the same line again and again?

hunzter · Accepted Answer · 2019-11-17 10:27:55Z

Appending data to an existing DataFrame is really slow.

You better created a list of data from soup, creating a new dataframe, then concat the new data frame to your old one

This is a quick benchmark, using an empty df for each case. In your real code, df should be your existing dataframe:

# setup some sample data
headers = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7', 
           'col8', 'col9', 'col10', 'col11', 'col12', 'col13', 'col14',
           'col15', 'col16', 'col17']
raw_data = 'Mon 11/11,SA,100,31,3-5,60.0,1-3,33.3,1-2,50.0,10,4,0,1,1,2,8'.split(",")
row_dict_data = dict(zip(headers, raw_data))

# append
%%time
df = pd.DataFrame(columns=headers)
for i in range(100):
    df = df.append([row_dict_data])

# CPU times: user 258 ms, sys: 4.82 ms, total: 263 ms
# Wall time: 261 ms


# new dataframe
%%time
df = pd.DataFrame(columns=headers)
df2 = pd.DataFrame([raw_data for i in range(100)], columns=headers)
df3 = pd.concat([df, df2], sort=False)

# CPU times: user 7.03 ms, sys: 1.16 ms, total: 8.2 ms
# Wall time: 7.19 ms

Collectives™ on Stack Overflow

filling existing dataframe with rows from loop

3 Answers 3

9 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

9 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related