3

I tried going over How to build and fill pandas dataframe from for loop? but cant seem to write my values to my columns.

Ultimately I am getting data from a webpage and want to put it into a dataframe.

my headers are predefined as:

d1 = pd.DataFrame(columns=['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7', 'col8', 'col9',
        'col10', 'col11', 'col12', 'col13', 'col14', 'col15', 'col16', 'col17'])

now I have values I get in a for loop, how can I write these rows to each column then repeat back to column 1 to 17 and then next row?

row = soup.find_all('td', attrs = {'class': 'Table__TD'})
for data in row:
    print(data.get_text())

sample output row 1

Mon 11/11
SA
100
31
3-5
60.0
1-3
33.3
1-2
50.0
10
4
0
1
1
2
8

Sample output row 2

Wed 11/13
@CHA
W119-117
32
1-5
20.0
1-5
20.0
0-0
0.0
3
1
0
1
3
3
3

Expected output

enter image description here

Any help would be appreciated.

5
  • 1
    What does row have? @excelguy Commented Nov 13, 2019 at 3:55
  • check this stackoverflow.com/questions/51499385/… Commented Nov 13, 2019 at 4:09
  • I’ll second what @Vishnudev said. We need more information, about the code, where the data comes from, etc. See: minimal reproducible example. Commented Nov 13, 2019 at 4:15
  • d1.loc[len(d1), col_name] = value but using a loop to put values in a dataframe sounds really bad. I suggest you post a new question with your bigger issue, of entering data into dataframe, so people can see if it can be done without a loop at all. Commented Nov 13, 2019 at 8:29
  • Added details, hopefully this helps. Commented Nov 14, 2019 at 0:22

3 Answers 3

1
+50

You can try this,

import pandas as pd

columns = [
    'col1',
    'col2',
    'col3',
    'col4',
    'col5',
    'col6',
    'col7',
    'col8',
    'col9',
    'col10',
    'col11',
    'col12',
    'col13',
    'col14',
    'col15',
    'col16',
    'col17',
]

# create dataframe
d1 = pd.DataFrame(columns=columns)

full = []

for data in soup.find_all('td', attrs={'class': 'Table__TD'}):
    full.append(data.get_text())

# seperate full list into sub-lists with 17 elements
rows = [full[i: i+17] for i in range(0, len(full), 17)]

# append list of lists structure to dataframe
d1 = d1.append(pd.DataFrame(rows, columns=d1.columns))
Sign up to request clarification or add additional context in comments.

9 Comments

Im getting undefined named 'soups` in my ide. but when I run the code im getting ValueError: Length mismatch: Expected axis has 0 elements, new values have 17 elements
I edited my answer. Can you still share soup with me please?
I wrote soups for the possibility that you could have more than one soup. That's what I mean by iteration.
hey, i dont have a soups, but i have this, row = soup.find_all('td', attrs = {'class': 'Table__TD'}) for data in row: print(data.get_text()) this gets my data I want to append to each column. Does this help?
I updated my answer again. Can you check this again please?
|
1

First we have list for column names:

cols = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7', 'col8', 'col9',
        'col10', 'col11', 'col12', 'col13', 'col14', 'col15', 'col16', 'col17']

Then list for values:

row = [x.get_text() for x in soup.find_all('td', attrs = {'class': 'Table__TD'})]
print(row)
# ['Mon 11/11', 'SA', '100', '31', '3-5', '60.0', '1-3', '33.3', '1-2', '50.0', '10', '4', '0', '1', '1', '2', '8']

Then we can zip the columns and the values together, then append to the dataframe:

d1 = d1.append(dict(zip(cols, row)), ignore_index=True)
print(d1)
#         col1 col2 col3 col4 col5  col6 col7  col8 col9 col10 col11 col12  \
# 0  Mon 11/11   SA  100   31  3-5  60.0  1-3  33.3  1-2  50.0    10     4   
# 
#   col13 col14 col15 col16 col17  
# 0     0     1     1     2     8

1 Comment

This may be a good option as my headers will remain static. Issue: how can I iterate through my beautiful soup, as if I keep running my code it appends the same line again and again?
1

Appending data to an existing DataFrame is really slow.

You better created a list of data from soup, creating a new dataframe, then concat the new data frame to your old one

This is a quick benchmark, using an empty df for each case. In your real code, df should be your existing dataframe:

# setup some sample data
headers = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7', 
           'col8', 'col9', 'col10', 'col11', 'col12', 'col13', 'col14',
           'col15', 'col16', 'col17']
raw_data = 'Mon 11/11,SA,100,31,3-5,60.0,1-3,33.3,1-2,50.0,10,4,0,1,1,2,8'.split(",")
row_dict_data = dict(zip(headers, raw_data))

# append
%%time
df = pd.DataFrame(columns=headers)
for i in range(100):
    df = df.append([row_dict_data])

# CPU times: user 258 ms, sys: 4.82 ms, total: 263 ms
# Wall time: 261 ms


# new dataframe
%%time
df = pd.DataFrame(columns=headers)
df2 = pd.DataFrame([raw_data for i in range(100)], columns=headers)
df3 = pd.concat([df, df2], sort=False)

# CPU times: user 7.03 ms, sys: 1.16 ms, total: 8.2 ms
# Wall time: 7.19 ms

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.