Python : count string in column dataframe that belong to a list

Question

I spent a day trying to solve my problem...

I have a DataFrame that I import from CSV file. Here an example:

df=pd.DataFrame(['{"choices": ["rougeur", "hematome","oedeme","ecoul","necrose"]}','ecoul','necrose','','oedeme'])

I have my list of my possible labels:

label_sl=['rougeur', 'hematome', 'oedeme','ecoul','extra','necrose']

I would like to create a new dataframe that returns:

rougeur hematome oedeme ecoul extra necrose
1 1 1 1 0 1
1 0 0 0 0 0
0 0 0 0 0 1
0 0 0 0 0 0
0 0 1 0 0 0

I don't find the solution... If you have an idea...

Thanks,

AL

do you really have string representation of dictionaries in the input? — mozway
– mozway, Commented Aug 8, 2022 at 14:34
Could you explain this part '{"choices": ["rougeur", "hematome","oedeme","ecoul","necrose"]}'? Why is it in a dictionary form? Could there be more key:value pairs? — Vitalizzare
– Vitalizzare, Commented Aug 8, 2022 at 14:34

rhug123 · Accepted Answer · 2022-08-08 14:42:52Z

1

If all your values including your dictionary are actually strings, this should work:

(df[0].str.replace(r'[\[\]{}"]','',regex=True)
.str.strip()
.str.split('[, ]')
.explode()
.str.get_dummies()
.groupby(level=0).sum()
.reindex(label_sl,axis=1)
.fillna(0)
.astype(int))

Output:

   rougeur  hematome  oedeme  ecoul  extra  necrose
0        1         1       1      1      0        1
1        0         0       0      1      0        0
2        0         0       0      0      0        1
3        0         0       0      0      0        0
4        0         0       1      0      0        0

answered Aug 8, 2022 at 14:42

rhug123

8,8801 gold badge14 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Anelor Guinet Over a year ago

But I've a problem when I would like to apply to my real DataFrame... I don't understand why. If my column name is 'choice' should I apply this : test=(exportLS['choice'].str.replace(r'[[]{}"]','',regex=True) .str.strip() .str.split('[, ]') .explode() .str.get_dummies() .groupby(level=['choice']).sum() .reindex(label_sl,axis=1) .fillna(0) .astype(int))

Anelor Guinet Over a year ago

code test=(exportLS['choice'].str.replace(r'[[]{}"]','',regex=True) .str.strip() .str.split('[, ]') .explode() .str.get_dummies() .groupby(level=['choice']).sum() .reindex(label_sl,axis=1) .fillna(0) .astype(int)) code

Anelor Guinet Over a year ago

exportLS is the name of my dataframe (previously named df)

Anelor Guinet Over a year ago

Oh shame on me !!! groupby(level=0) !!!

Anelor Guinet Over a year ago

Again, thanks a lot, you are very strong !

Vitalizzare · Accepted Answer · 2022-08-08 14:59:23Z

1

Regular expression \bsomething\b extracts something as a separate word. We can use it like this:

for x in label_sl:
    df[x] = df.iloc[:,0].str.contains("\\b" + x + "\\b").astype(int)

where

label_sl=['rougeur', 'hematome', 'oedeme','ecoul','extra','necrose']
df=pd.DataFrame(['{"choices": ["rougeur", "hematome","oedeme","ecoul","necrose"]}','ecoul','necrose','','oedeme'])

answered Aug 8, 2022 at 14:59

Vitalizzare

7,63611 gold badges23 silver badges49 bronze badges

Collectives™ on Stack Overflow

Python : count string in column dataframe that belong to a list

2 Answers 2

5 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related