1

I spent a day trying to solve my problem...

I have a DataFrame that I import from CSV file. Here an example:

df=pd.DataFrame(['{"choices": ["rougeur", "hematome","oedeme","ecoul","necrose"]}','ecoul','necrose','','oedeme'])

I have my list of my possible labels:

label_sl=['rougeur', 'hematome', 'oedeme','ecoul','extra','necrose']

I would like to create a new dataframe that returns:

rougeur hematome oedeme ecoul extra necrose
1 1 1 1 0 1
1 0 0 0 0 0
0 0 0 0 0 1
0 0 0 0 0 0
0 0 1 0 0 0

I don't find the solution... If you have an idea...

Thanks,

AL

2
  • 2
    do you really have string representation of dictionaries in the input? Commented Aug 8, 2022 at 14:34
  • Could you explain this part '{"choices": ["rougeur", "hematome","oedeme","ecoul","necrose"]}'? Why is it in a dictionary form? Could there be more key:value pairs? Commented Aug 8, 2022 at 14:34

2 Answers 2

1

If all your values including your dictionary are actually strings, this should work:

(df[0].str.replace(r'[\[\]{}"]','',regex=True)
.str.strip()
.str.split('[, ]')
.explode()
.str.get_dummies()
.groupby(level=0).sum()
.reindex(label_sl,axis=1)
.fillna(0)
.astype(int))

Output:

   rougeur  hematome  oedeme  ecoul  extra  necrose
0        1         1       1      1      0        1
1        0         0       0      1      0        0
2        0         0       0      0      0        1
3        0         0       0      0      0        0
4        0         0       1      0      0        0
Sign up to request clarification or add additional context in comments.

5 Comments

But I've a problem when I would like to apply to my real DataFrame... I don't understand why. If my column name is 'choice' should I apply this : test=(exportLS['choice'].str.replace(r'[[]{}"]','',regex=True) .str.strip() .str.split('[, ]') .explode() .str.get_dummies() .groupby(level=['choice']).sum() .reindex(label_sl,axis=1) .fillna(0) .astype(int))
code test=(exportLS['choice'].str.replace(r'[[]{}"]','',regex=True) .str.strip() .str.split('[, ]') .explode() .str.get_dummies() .groupby(level=['choice']).sum() .reindex(label_sl,axis=1) .fillna(0) .astype(int)) code
exportLS is the name of my dataframe (previously named df)
Oh shame on me !!! groupby(level=0) !!!
Again, thanks a lot, you are very strong !
1

Regular expression \bsomething\b extracts something as a separate word. We can use it like this:

for x in label_sl:
    df[x] = df.iloc[:,0].str.contains("\\b" + x + "\\b").astype(int)

where

label_sl=['rougeur', 'hematome', 'oedeme','ecoul','extra','necrose']
df=pd.DataFrame(['{"choices": ["rougeur", "hematome","oedeme","ecoul","necrose"]}','ecoul','necrose','','oedeme'])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.