1

I have a df:

df = pd.DataFrame({'age': [13,62,53, 33],
                   'gender': ['male','female','male', 'male'],
                   'symptoms': [['acute respiratory distress', 'fever'],
                                ['acute respiratory disease', 'cough'],
                                ['fever'],
                                ['respiratory distress']]})


df

Output:

       age    gender    symptoms
0       31      male    [acute respiratory distress, fever]
1       62      female  [acute respiratory disease, cough]
2       23      male    [fever]
3       33      male    [respiratory distress]

I am trying to replace all instances of values in the 'symptom' column (which are lists in this case) that contain the substring "respiratory", and change the entire value in that list to "acute respiratory distress" so it is uniform through out the data frame. This is the desired outcome:

Output:

       age    gender    symptoms
0       31      male    [acute respiratory distress, fever]
1       62      female  [acute respiratory distress, cough]
2       23      male    [fever]
3       33      male    [acute respiratory distress]

I have tried:

df.loc[df['symptoms'].str.contains('respiratory', na=False), 'symptoms'] = 'acute respiratory 
distress'

print(df)

The data frame remains as it was however.

1
  • Elements in df['symptoms'] are of type list, not string. df['symptoms'].str.contains does not work as you want, because it is expecting a string not a list. Commented Jun 28, 2020 at 23:23

2 Answers 2

2

Like this:

import pandas as pd

df = pd.DataFrame({'age': [13,62,53, 33],
                   'gender': ['male','female','male', 'male'],
                   'symptoms': [['acute respiratory distress', 'fever'],
                                ['acute respiratory disease', 'cough'],
                                ['fever'],
                                ['respiratory distress']]})

df['symptoms'] = [['acute respiratory disease' if 'respiratory' in s else s for s in lst] for lst in df['symptoms']]
       
print(df)

Output:

   age  gender                            symptoms
0   13    male  [acute respiratory disease, fever]
1   62  female  [acute respiratory disease, cough]
2   53    male                             [fever]
3   33    male         [acute respiratory disease]
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! That produced the result I was looking for
0

Join the explode, then use contains assign

>>> s = df.symptoms.explode()
>>> df['symptoms'] = s.mask(s.str.contains('respiratory'),'acute respiratory distress').groupby(level=0).agg(list)
>>> df
   age  gender                             symptoms
0   13    male  [acute respiratory distress, fever]
1   62  female  [acute respiratory distress, cough]
2   53    male                              [fever]
3   33    male         [acute respiratory distress]

1 Comment

Thanks @YOBEN_S, this was a helpful technique

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.