2

I have a list of which each element is a list with strings from a book

test_list = [['I love Stackoverflow', 'For ever', 'and always'], ['I dont like rain', 'it is wet']]
book_names = ['message to SO', 'confessions']

I would like to obtain the following dataframe


          book              sentence
0  message to SO  I love Stackoverflow
1  message to SO              For ever
2  message to SO            and always
3    confessions      I dont like rain
4    confessions             it is wet

Now, I managed to do this with the following piece of code:

df = pd.DataFrame(test_list, index=book_names).stack().reset_index(level=0)
df.rename(columns={'level_0':'book',
                    0 : 'sentence'},
                    inplace = True)

Resulting in :

            book              sentence
0  message to SO  I love Stackoverflow
1  message to SO              For ever
2  message to SO            and always
0    confessions      I dont like rain
1    confessions             it is wet

Now i have to reindex the result:

df.reset_index(drop=True)

I am not particularly happy with this code, having to reset_index and renaming columns. Anyone has a better solution?

In reality the test_list is rather large so speed is also an important consideration

Thanks in advance

1 Answer 1

3

I think here is best create list of tuples in list comprehension with zip and pass to DataFrame constructor:

df = pd.DataFrame([(b,s) for b, n in zip(book_names, test_list) for s in n], 
                   columns=['book','sentence'])
print (df)
            book              sentence
0  message to SO  I love Stackoverflow
1  message to SO              For ever
2  message to SO            and always
3    confessions      I dont like rain
4    confessions             it is wet

Only pandas solution is with DataFrame.explode:

df = pd.DataFrame({'book':book_names ,
                   'sentence':test_list}).explode('sentence').reset_index(drop=True)
print (df)
            book              sentence
0  message to SO  I love Stackoverflow
1  message to SO              For ever
2  message to SO            and always
3    confessions      I dont like rain
4    confessions             it is wet
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks a lot, amazing answer! One more question, why would you prefer the tuple/listcomprehension approach over the pandas solution? -- EDIT: nm, I prefer it myself as well, much cleaner. Thanks
@Rens - Hmm, I think it depends of data, but I think list comprehension should be faster here. Th best test performance in real data

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.