How to replace a string in a list if it contains a substring in Pandas DataFrame column

Question

I have a df:

df = pd.DataFrame({'age': [13,62,53, 33],
                   'gender': ['male','female','male', 'male'],
                   'symptoms': [['acute respiratory distress', 'fever'],
                                ['acute respiratory disease', 'cough'],
                                ['fever'],
                                ['respiratory distress']]})


df

Output:

       age    gender    symptoms
0       31      male    [acute respiratory distress, fever]
1       62      female  [acute respiratory disease, cough]
2       23      male    [fever]
3       33      male    [respiratory distress]

I am trying to replace all instances of values in the 'symptom' column (which are lists in this case) that contain the substring "respiratory", and change the entire value in that list to "acute respiratory distress" so it is uniform through out the data frame. This is the desired outcome:

Output:

       age    gender    symptoms
0       31      male    [acute respiratory distress, fever]
1       62      female  [acute respiratory distress, cough]
2       23      male    [fever]
3       33      male    [acute respiratory distress]

I have tried:

df.loc[df['symptoms'].str.contains('respiratory', na=False), 'symptoms'] = 'acute respiratory 
distress'

print(df)

The data frame remains as it was however.

Elements in df['symptoms'] are of type list, not string. df['symptoms'].str.contains does not work as you want, because it is expecting a string not a list. — Marcos
– Marcos, Commented Jun 28, 2020 at 23:23

Red · Accepted Answer · 2020-06-28 23:29:48Z

2

Like this:

import pandas as pd

df = pd.DataFrame({'age': [13,62,53, 33],
                   'gender': ['male','female','male', 'male'],
                   'symptoms': [['acute respiratory distress', 'fever'],
                                ['acute respiratory disease', 'cough'],
                                ['fever'],
                                ['respiratory distress']]})

df['symptoms'] = [['acute respiratory disease' if 'respiratory' in s else s for s in lst] for lst in df['symptoms']]
       
print(df)

Output:

   age  gender                            symptoms
0   13    male  [acute respiratory disease, fever]
1   62  female  [acute respiratory disease, cough]
2   53    male                             [fever]
3   33    male         [acute respiratory disease]

edited Jun 28, 2020 at 23:29

answered Jun 28, 2020 at 23:03

Red

27.7k8 gold badges44 silver badges63 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Eddie S Over a year ago

Thanks! That produced the result I was looking for

Red · Accepted Answer · 2020-06-29 01:10:58Z

0

Join the explode, then use contains assign

>>> s = df.symptoms.explode()
>>> df['symptoms'] = s.mask(s.str.contains('respiratory'),'acute respiratory distress').groupby(level=0).agg(list)
>>> df
   age  gender                             symptoms
0   13    male  [acute respiratory distress, fever]
1   62  female  [acute respiratory distress, cough]
2   53    male                              [fever]
3   33    male         [acute respiratory distress]

edited Jun 29, 2020 at 1:10

Red

27.7k8 gold badges44 silver badges63 bronze badges

answered Jun 28, 2020 at 23:04

BENY

324k22 gold badges176 silver badges250 bronze badges

1 Comment

Eddie S Over a year ago

Thanks @YOBEN_S, this was a helpful technique

Collectives™ on Stack Overflow

How to replace a string in a list if it contains a substring in Pandas DataFrame column

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related