1

I have a dataframe with a column containing list of strings.

id sentence                                            category
0  "I love basketball and dunk to the basket"          ['basketball']
1  "I am playing football and basketball tomorrow "    ['football', 'basketball']

I would like to do 2 things:

    1. Transform category column where every elements from previous list become a string and have one row for each string and with same id and sentence
    1. Have one dataframe by category

Expected output for step 1):

id sentence                                            category
0  "I love basketball and dunk to the basket"          'basketball'
1  "I am playing football and tomorrow basketball"     'football'
1  "I am playing football and tomorrow basketball"     'basketball'

Expected output for step 2):

DF_1

id sentence                                            category
0  "I love basketball and dunk to the basket"          'basketball'
1  "I am playing football and tomorrow basketball"     'basketball'

DF_2

id sentence                                            category
1  "I am playing football and tomorrow basketball"     'football'

How can I do this ? For each and examine len of each list can work, but is there a more faster/elegant way ?

2 Answers 2

2

You could explode "category"; then groupby:

out = [g for _, g in df.explode('category').groupby('category')]

Then if you print the items in out:

for i in out:
    print(i, end='\n\n')

you'll see:

   id                                        sentence    category
0   0        I love basketball and dunk to the basket  basketball
1   1  I am playing football and basketball tomorrow   basketball

   id                                        sentence  category
1   1  I am playing football and basketball tomorrow   football
Sign up to request clarification or add additional context in comments.

2 Comments

Clearly, the guts of the problem are resolved with this answer... however, we can get cute with the output. The object that is the result of a groupby can be iterated through and we can cheese that into a dictionary: dict((*df.explode('category').groupby('category'),))
@piRSquared I thought of creating a dictionary but went with a list because I wanted to show the print output. But yeah, that's a nice point (and groupby object is a strange thing; has to be unpacked to be cast into a dict)
2

You'll need two tools : explode and groupby.

First let's prepare our data, and ensure explode will work with literal_eval :

import pandas as pd
from io import StringIO
from ast import literal_eval

csvfile = StringIO(
"""id\tsentence\tcategory
0\t"I love basketball and dunk to the basket"\t["basketball"]
1\t"I am playing football and basketball tomorrow "\t["football", "basketball"]""")

df = pd.read_csv(csvfile, sep = '\t', engine='python')

df.loc[:, 'category'] = df.loc[:, 'category'].apply(literal_eval)

Then explode regarding your category columns :

df = df.explode('category')

Finally, you can use groupby as a dictionary and store your sub dataframes elsewhere :

dg = df.groupby('category')

list_dg = []

for n, g in dg:
    list_dg.append(g)

Imo, I will stick with dg if possible

1 Comment

The main problem-solving component of this answer has already been provided by another answer. However, you provided a useful tool to get the sample data into a dataframe. This also brings up a possible point of ambiguity that the OP may not have realized. That being that the dataframe could contain strings that look like list literals or actual lists.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.