0

I have extracted some html using BeautifulSoup, and created a function to get the useful information only. I intend to run this function for multiple keywords, and put them in a dataframe. However, I cannot get to all lists into the pandas DataFrame.

Example:

words = ['header', 'title', 'number']

The following code gets me lists all headers, titles and numbers and are all the same length.

def create_list(x):
    column = []
    BRKlist = BRK.find_all(x)
    for n in BRKlist:
        drop_beginning = r'<'+x+'>'
        drop_end = r'</'+x+'>'
        no_beginning = re.sub(drop_beginning, '', str(n))
        final = re.sub(drop_end, '', str(no_beginning))
        column.append(final)
    print(column)

This code outputs:

['header1', 'header2', 'header3']
['title1', 'title2', 'title3']
['number1', 'number2', 'number3']

I am looking for something to get 1 dataframe that gives me a DataFrame that looks like this:

header title number
header1 title1 number1
header2 title2 number2
header3 title3 number3

Getting the lists was no problem, but when I make an empty data frame:

df = pd.DataFrame({x: []})

and try to append the columns, I get the following error:

TypeError: unhashable type: 'list'

Is there any way to circumvent this, or any other/easier way to "append columns"?

2
  • 1
    are you planning to build a DataFrame inside create_list or outside? As it stands, this function doesn't return anything; just prints lists. Commented Apr 20, 2022 at 21:48
  • @enke Thanks for your answer, I indeed want to create the DataFrame inside the create_list function, so I can export it easily to CSV afterwards. Commented Apr 21, 2022 at 11:46

1 Answer 1

1

If you want to build a dataframe with only three columns, the easiest way maybe is:

 import pandas as pd 
 A= [['header1', 'header2', 'header3'],
 ['title1', 'title2', 'title3'],
 ['number1', 'number2', 'number3']]
df= pd.DataFrame()
df['header']= [A[0][i] for i in range(3)]
df['title']= [A[1][i] for i in range(3)]
df['number']= [A[2][0] for i in range(3)]
df
Sign up to request clarification or add additional context in comments.

7 Comments

df=pd.DataFrame(zip(*A), columns=words) would be a little concise.
@enke You always has a better solution:)
Thanks, but the 3 columns are just an example. I might want more columns, in which case I only want to add an item to the list 'words'. The function should add the additional column automatically.
@enke Thanks, this is really helpful. I created a nested list and this function turns it into a DataFrame with the proper column names. However df=pd.DataFrame(zip(*A), columns=words) only works outside the function. Do you know why that is?
@Not_a_Robot that's because A is built using create_list(), right? In general, it's more efficient to store your data in a list and build a DataFrame once (since you're working with a list anyway) instead of building a DataFrame/Series in a loop and concatenating them later on. So building df outside the function is the correct way imo.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.