1

I have a dataframe with a column with individuals' names:

name
Mr. Salmon
Mr Salmon
Ms. Salmon
Mrs. Salmon
Mrs Salmon
...

I would like to remove all the honorifics. I compiled the following regex at regex101.com and confirmed all the matches.

(^[Mm]([Rr]|[Ss]|[Xx]|[Rr][Ss]|[Ii][Ss]+)\.?\s)|(^[Mm][Ii][Ss][Tt][Ee][Rr]\.?\s)|(^[Mm][Ii][Ss]+[Uu][Ss]\.?\s)

I am using the replace method on the names dataframe to remove the regex matches with nothing. I am using the following code:

names_nohf = names.replace(r'(^[Mm]([Rr]|[Ss]|[Xx]|[Rr][Ss]|[Ii][Ss]+)\.?\s)|(^[Mm][Ii][Ss][Tt][Ee][Rr]\.?\s)|(^[Mm][Ii][Ss]+[Uu][Ss]\.?\s)', regex = True)

This, however, is not returning the desired names and is in fact making no changes at all. Could someone please point me to the right direction?

3
  • maybe you have to add value which you want to put in place of found strings - ie. replace("old", "new") - and then uses empty string as new string. Commented Jan 8, 2020 at 0:10
  • The replace method assumes None as the default argument for ‘new’ as per this documentation: pandas.pydata.org/pandas-docs/stable/reference/api/… Commented Jan 8, 2020 at 0:13
  • when I test with None then it doesn't work but if I put empty string then it works Commented Jan 8, 2020 at 0:15

1 Answer 1

1

Use empty string as new value

import pandas as pd

data = {'X': ['Mr A', 'Mr B', 'Mr C']}

df = pd.DataFrame(data)
print(df)

df = df.replace('Mr', '', regex=True)
print(df)

Result

      X
0  Mr A
1  Mr B
2  Mr C

    X
0   A
1   B
2   C
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.