Append lists as columns in Pandas DataFrame Python

Question

I have extracted some html using BeautifulSoup, and created a function to get the useful information only. I intend to run this function for multiple keywords, and put them in a dataframe. However, I cannot get to all lists into the pandas DataFrame.

Example:

words = ['header', 'title', 'number']

The following code gets me lists all headers, titles and numbers and are all the same length.

def create_list(x):
    column = []
    BRKlist = BRK.find_all(x)
    for n in BRKlist:
        drop_beginning = r'<'+x+'>'
        drop_end = r'</'+x+'>'
        no_beginning = re.sub(drop_beginning, '', str(n))
        final = re.sub(drop_end, '', str(no_beginning))
        column.append(final)
    print(column)

This code outputs:

['header1', 'header2', 'header3']
['title1', 'title2', 'title3']
['number1', 'number2', 'number3']

I am looking for something to get 1 dataframe that gives me a DataFrame that looks like this:

header	title	number
header1	title1	number1
header2	title2	number2
header3	title3	number3

Getting the lists was no problem, but when I make an empty data frame:

df = pd.DataFrame({x: []})

and try to append the columns, I get the following error:

TypeError: unhashable type: 'list'

Is there any way to circumvent this, or any other/easier way to "append columns"?

are you planning to build a DataFrame inside create_list or outside? As it stands, this function doesn't return anything; just prints lists. — user7864386
– user7864386, Commented Apr 20, 2022 at 21:48
@enke Thanks for your answer, I indeed want to create the DataFrame inside the create_list function, so I can export it easily to CSV afterwards. — Not_a_Robot
– Not_a_Robot, Commented Apr 21, 2022 at 11:46

Sadcow · Accepted Answer · 2022-04-20 21:49:00Z

1

If you want to build a dataframe with only three columns, the easiest way maybe is:

 import pandas as pd 
 A= [['header1', 'header2', 'header3'],
 ['title1', 'title2', 'title3'],
 ['number1', 'number2', 'number3']]
df= pd.DataFrame()
df['header']= [A[0][i] for i in range(3)]
df['title']= [A[1][i] for i in range(3)]
df['number']= [A[2][0] for i in range(3)]
df

answered Apr 20, 2022 at 21:49

Sadcow

7306 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

user7864386 Over a year ago

df=pd.DataFrame(zip(*A), columns=words) would be a little concise.

Sadcow Over a year ago

@enke You always has a better solution:)

Not_a_Robot Over a year ago

Thanks, but the 3 columns are just an example. I might want more columns, in which case I only want to add an item to the list 'words'. The function should add the additional column automatically.

Not_a_Robot Over a year ago

@enke Thanks, this is really helpful. I created a nested list and this function turns it into a DataFrame with the proper column names. However df=pd.DataFrame(zip(*A), columns=words) only works outside the function. Do you know why that is?

user7864386 Over a year ago

@Not_a_Robot that's because A is built using create_list(), right? In general, it's more efficient to store your data in a list and build a DataFrame once (since you're working with a list anyway) instead of building a DataFrame/Series in a loop and concatenating them later on. So building df outside the function is the correct way imo.

|

Collectives™ on Stack Overflow

Append lists as columns in Pandas DataFrame Python

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related