how to remove certain strings in pandas dataframe

Question

I have a dataframe, df with a column that has different school names, school_name. I want to remove certain words, and wonder what the best way to go about this might be.

For example, I want to remove ‘male’ and ‘female’ from strings like:

‘gps hafiz shahmale p’
‘gpps mogal malep’ 
‘government primary school chak femalep’ 
‘govt girls high school syebadadfemale p’ 
‘ghs male p’
…

There are many other strings besides ‘male’ or ‘female’ that I want to remove that have similar complexities, e.g:

I also want to remove ‘sbcombined’ from strings like:

'government girls high school chak no120sbcombinedp',
'govt boys elementary school chak no119sbcombined t',
'govt boys elementary school chak no 37 sbcombined p'
…

All I could think of now is to write separate functions for each words, e.g. to remove ‘male’:

l = df.school_name.tolist()

for i in l: 
    if (i[-4:]=='male') or (i[-5:-1]=='male' and i[-7:-5]!='fe'):
        i2 = i.replace('male', '')
    df.loc[df.school_name==i, school_name] = i2

Is there a better, more efficient way to go about this?

edit: I also would like to know how I could deal with the complexity involved with the string 'male' - 'male' is part of the string 'female' (which I want to remove as well), that when I use re.search to remove the word 'male', for strings that include the word 'female', the 'male' part of the 'female' word gets removed that only 'fe' is left behind; something which I want to avoid.

Dishin Goyani · Accepted Answer · 2020-08-18 08:35:46Z

1

Use str.replace

pattern = '|'.join(['male','female'])
df['school_name'] = df.school_name.str.replace(pattern, '')

It will replace all words in list with '' empty string.

answered Aug 18, 2020 at 8:35

Dishin Goyani

7,7533 gold badges33 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Artyom Akselrod · Accepted Answer · 2020-08-19 08:14:50Z

0

If you can specificy words you want to remove in a list replace_word_list, try something like:

for word in replace_word_list:
    df['school_name'] = df['school_name'].str.replace(word, '')

edited Aug 19, 2020 at 8:14

answered Aug 18, 2020 at 8:33

Artyom Akselrod

9966 silver badges15 bronze badges

Collectives™ on Stack Overflow

how to remove certain strings in pandas dataframe

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related