6

I am trying to replace strings in a dataframe if the whole string equals another string. I do not want to replace substrings.

So:

If I have df:

 Index  Name       Age
   0     Joe        8
   1     Mary       10
   2     Marybeth   11

and I want to replace "Mary" when the whole string matches "Mary" with "Amy" so I get

 Index  Name       Age
   0     Joe        8
   1     Amy        10
   2     Marybeth   11

I'm doing the following:

df['Name'] = df['Name'].apply(lambda x: x.replace('Mary','Amy'))

My understanding from searching around is that the defaults of replace set regex=False and replace should look for the whole value in the dataframe to be "Mary". Instead I'm getting this result:

 Index  Name       Age
   0     Joe        8
   1     Amy        10
   2     Amybeth   11

What am I doing wrong?

3 Answers 3

6

replace + dict is the way to go (With DataFrame, you are using Series.str.replace)

df['Name'].replace({'Mary':'Amy'})
Out[582]: 
0         Joe
1         Amy
2    Marybeth
Name: Name, dtype: object
df['Name'].replace({'Mary':'Amy'},regex=True)
Out[583]: 
0        Joe
1        Amy
2    Amybeth
Name: Name, dtype: object

Notice they are different

Series: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.replace.html

DataFrame: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.replace.html

Sign up to request clarification or add additional context in comments.

Comments

3

Explanation:

When you apply it like this - you are working with strings, not with Pandas Series:

In [42]: df['Name'].apply(lambda x: print(type(x)))
<class 'str'>  # <---- NOTE
<class 'str'>  # <---- NOTE
<class 'str'>  # <---- NOTE
Out[42]:
0    None
1    None
2    None
Name: Name, dtype: object

It's the same as:

In [44]: 'Marybeth'.replace('Mary','Amy')
Out[44]: 'Amybeth'

Solution:

Use Series.replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad', axis=None) properly (without Series.apply()) - per default (regex=False) it will replace whole strings - as you expect it to work:

In [39]: df.Name.replace('Mary','Amy')
Out[39]:
0         Joe
1         Amy
2    Marybeth
Name: Name, dtype: object

you can explicitly specify regex=True, this will replace substrings:

In [40]: df.Name.replace('Mary','Amy', regex=True)
Out[40]:
0        Joe
1        Amy
2    Amybeth
Name: Name, dtype: object

NOTE: Series.str.replace(pat, repl, n=-1, case=None, flags=0) doesn't have regex parameter - it's always treats pat and repl as RegEx's:

In [41]: df.Name.str.replace('Mary','Amy')
Out[41]:
0        Joe
1        Amy
2    Amybeth
Name: Name, dtype: object

Comments

2

You can use also loc to locate instances where the name exactly matches, and then set to the new name.

df.loc[df['Name'] == 'Mary', 'Name'] = "Amy"

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.