2

I have a large pandas dataframe of email address and wanted to replace all the .edu emails with "Edu". I came up with an highly inefficient way of doing it but there has to be a better way of doing it. This is how I do it:

import pandas as pd
import re
inp = [{'c1':10, 'c2':'gedua.com'},   {'c1':11,'c2':'wewewe.Edu'},   {'c1':12,'c2':'wewewe.edu.ney'}]
dfn = pd.DataFrame(inp)

for index, row in dfn.iterrows():
    try:
        if len(re.search('\.edu', row['c2']).group(0)) > 1:
            dfn.c2[index] = 'Edu'
            print('Education')
    except:
        continue
2
  • 1
    So you only want to change the last e-mail, even though the second e-mail ends with .Edu? Or do you also want to change all variations of .edu regardless of capitalization? Commented Aug 23, 2018 at 17:47
  • Yes, I am sorry I wrote ".Edu". It is ".edu" that I want to replace Commented Aug 23, 2018 at 18:07

2 Answers 2

3

Using str.contains for case insensitive selection, and assignment with loc.

dfn.loc[dfn.c2.str.contains(r'\.Edu', case=False), 'c2'] = 'Edu'    
dfn

   c1         c2
0  10  gedua.com
1  11        Edu
2  12        Edu

If it's only the emails ending with .edu you want to replace, then

dfn.loc[dfn.c2.str.contains(r'\.Edu$', case=False), 'c2'] = 'Edu'

Or, as suggested by piR,

dfn.loc[dfn.c2.str.endswith('.Edu'), 'c2'] = 'Edu'

dfn

   c1              c2
0  10       gedua.com
1  11             Edu
2  12  wewewe.edu.ney  
Sign up to request clarification or add additional context in comments.

1 Comment

Also, dfn.loc[dfn.c2.str.endswith('.Edu'), 'c2']
2

replace

dfn.replace('^.*\.Edu$', 'Edu', regex=True)

   c1              c2
0  10       gedua.com
1  11             Edu
2  12  wewewe.edu.ney

The pattern '^.*\.Edu$' says grab everything from the beginning of the string to the point where we find '.Edu' followed by the end of the string, then replace that whole thing with 'Edu'


Column specific

You may want to limit the scope to just a column (or columns). You can do that by passing a dictionary to replace where the outer key specifies the column and the dictionary value specifies what is to be replaced.

dfn.replace({'c2': {'^.*\.Edu$': 'Edu'}}, regex=True)

   c1              c2
0  10       gedua.com
1  11             Edu
2  12  wewewe.edu.ney

Case insensitive [thx @coldspeed]

pandas.DataFrame.replace does not have a case flag. But you can imbed it in the pattern with '(?i)'

dfn.replace({'c2': {'(?i)^.*\.edu$': 'Edu'}}, regex=True)

   c1              c2
0  10       gedua.com
1  11             Edu
2  12  wewewe.edu.ney

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.