How transform list of strings in column and split dataframe by same string to have several?

Question

I have a dataframe with a column containing list of strings.

id sentence                                            category
0  "I love basketball and dunk to the basket"          ['basketball']
1  "I am playing football and basketball tomorrow "    ['football', 'basketball']

I would like to do 2 things:

1. Transform category column where every elements from previous list become a string and have one row for each string and with same id and sentence
1. Have one dataframe by category

Expected output for step 1):

id sentence                                            category
0  "I love basketball and dunk to the basket"          'basketball'
1  "I am playing football and tomorrow basketball"     'football'
1  "I am playing football and tomorrow basketball"     'basketball'

Expected output for step 2):

DF_1

id sentence                                            category
0  "I love basketball and dunk to the basket"          'basketball'
1  "I am playing football and tomorrow basketball"     'basketball'

DF_2

id sentence                                            category
1  "I am playing football and tomorrow basketball"     'football'

How can I do this ? For each and examine len of each list can work, but is there a more faster/elegant way ?

user7864386 · Accepted Answer · 2022-03-24 15:18:06Z

2

You could explode "category"; then groupby:

out = [g for _, g in df.explode('category').groupby('category')]

Then if you print the items in out:

for i in out:
    print(i, end='\n\n')

you'll see:

   id                                        sentence    category
0   0        I love basketball and dunk to the basket  basketball
1   1  I am playing football and basketball tomorrow   basketball

   id                                        sentence  category
1   1  I am playing football and basketball tomorrow   football

answered Mar 24, 2022 at 15:18

user7864386

Sign up to request clarification or add additional context in comments.

2 Comments

piRSquared Over a year ago

Clearly, the guts of the problem are resolved with this answer... however, we can get cute with the output. The object that is the result of a groupby can be iterated through and we can cheese that into a dictionary: dict((*df.explode('category').groupby('category'),))

user7864386 Over a year ago

@piRSquared I thought of creating a dictionary but went with a list because I wanted to show the print output. But yeah, that's a nice point (and groupby object is a strange thing; has to be unpacked to be cast into a dict)

Zelemist · Accepted Answer · 2022-03-24 15:35:00Z

2

You'll need two tools : explode and groupby.

First let's prepare our data, and ensure explode will work with literal_eval :

import pandas as pd
from io import StringIO
from ast import literal_eval

csvfile = StringIO(
"""id\tsentence\tcategory
0\t"I love basketball and dunk to the basket"\t["basketball"]
1\t"I am playing football and basketball tomorrow "\t["football", "basketball"]""")

df = pd.read_csv(csvfile, sep = '\t', engine='python')

df.loc[:, 'category'] = df.loc[:, 'category'].apply(literal_eval)

Then explode regarding your category columns :

df = df.explode('category')

Finally, you can use groupby as a dictionary and store your sub dataframes elsewhere :

dg = df.groupby('category')

list_dg = []

for n, g in dg:
    list_dg.append(g)

Imo, I will stick with dg if possible

answered Mar 24, 2022 at 15:35

Zelemist

6503 silver badges14 bronze badges

1 Comment

piRSquared Over a year ago

The main problem-solving component of this answer has already been provided by another answer. However, you provided a useful tool to get the sample data into a dataframe. This also brings up a possible point of ambiguity that the OP may not have realized. That being that the dataframe could contain strings that look like list literals or actual lists.

Collectives™ on Stack Overflow

How transform list of strings in column and split dataframe by same string to have several?

2 Answers 2

2 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related