Replace entire string based on regex match

Question

I have a large pandas dataframe of email address and wanted to replace all the .edu emails with "Edu". I came up with an highly inefficient way of doing it but there has to be a better way of doing it. This is how I do it:

import pandas as pd
import re
inp = [{'c1':10, 'c2':'gedua.com'},   {'c1':11,'c2':'wewewe.Edu'},   {'c1':12,'c2':'wewewe.edu.ney'}]
dfn = pd.DataFrame(inp)

for index, row in dfn.iterrows():
    try:
        if len(re.search('\.edu', row['c2']).group(0)) > 1:
            dfn.c2[index] = 'Edu'
            print('Education')
    except:
        continue

So you only want to change the last e-mail, even though the second e-mail ends with .Edu? Or do you also want to change all variations of .edu regardless of capitalization? — ALollz
– ALollz, Commented Aug 23, 2018 at 17:47
Yes, I am sorry I wrote ".Edu". It is ".edu" that I want to replace — deejay217
– deejay217, Commented Aug 23, 2018 at 18:07

cs95 · Accepted Answer · 2018-08-23 18:18:58Z

3

Using str.contains for case insensitive selection, and assignment with loc.

dfn.loc[dfn.c2.str.contains(r'\.Edu', case=False), 'c2'] = 'Edu'    
dfn

   c1         c2
0  10  gedua.com
1  11        Edu
2  12        Edu

If it's only the emails ending with .edu you want to replace, then

dfn.loc[dfn.c2.str.contains(r'\.Edu$', case=False), 'c2'] = 'Edu'

Or, as suggested by piR,

dfn.loc[dfn.c2.str.endswith('.Edu'), 'c2'] = 'Edu'

dfn

   c1              c2
0  10       gedua.com
1  11             Edu
2  12  wewewe.edu.ney

edited Aug 23, 2018 at 18:18

answered Aug 23, 2018 at 17:49

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

piRSquared Over a year ago

Also, dfn.loc[dfn.c2.str.endswith('.Edu'), 'c2']

piRSquared · Accepted Answer · 2018-08-23 18:02:49Z

`replace`

dfn.replace('^.*\.Edu$', 'Edu', regex=True)

   c1              c2
0  10       gedua.com
1  11             Edu
2  12  wewewe.edu.ney

The pattern '^.*\.Edu$' says grab everything from the beginning of the string to the point where we find '.Edu' followed by the end of the string, then replace that whole thing with 'Edu'

Column specific

You may want to limit the scope to just a column (or columns). You can do that by passing a dictionary to replace where the outer key specifies the column and the dictionary value specifies what is to be replaced.

dfn.replace({'c2': {'^.*\.Edu$': 'Edu'}}, regex=True)

   c1              c2
0  10       gedua.com
1  11             Edu
2  12  wewewe.edu.ney

Case insensitive [thx @coldspeed]

pandas.DataFrame.replace does not have a case flag. But you can imbed it in the pattern with '(?i)'

dfn.replace({'c2': {'(?i)^.*\.edu$': 'Edu'}}, regex=True)

   c1              c2
0  10       gedua.com
1  11             Edu
2  12  wewewe.edu.ney

Collectives™ on Stack Overflow

Replace entire string based on regex match

2 Answers 2

1 Comment

`replace`

Column specific

Case insensitive [thx @coldspeed]

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

replace

Column specific

Case insensitive [thx @coldspeed]

Comments

Your Answer

Sign up or log in

Post as a guest

`replace`