Python Replace Whole Values in Dataframe String and Not Substrings

Question

I am trying to replace strings in a dataframe if the whole string equals another string. I do not want to replace substrings.

So:

If I have df:

 Index  Name       Age
   0     Joe        8
   1     Mary       10
   2     Marybeth   11

and I want to replace "Mary" when the whole string matches "Mary" with "Amy" so I get

 Index  Name       Age
   0     Joe        8
   1     Amy        10
   2     Marybeth   11

I'm doing the following:

df['Name'] = df['Name'].apply(lambda x: x.replace('Mary','Amy'))

My understanding from searching around is that the defaults of replace set regex=False and replace should look for the whole value in the dataframe to be "Mary". Instead I'm getting this result:

 Index  Name       Age
   0     Joe        8
   1     Amy        10
   2     Amybeth   11

What am I doing wrong?

BENY · Accepted Answer · 2018-01-11 19:55:06Z

6

replace + dict is the way to go (With DataFrame, you are using Series.str.replace)

df['Name'].replace({'Mary':'Amy'})
Out[582]: 
0         Joe
1         Amy
2    Marybeth
Name: Name, dtype: object
df['Name'].replace({'Mary':'Amy'},regex=True)
Out[583]: 
0        Joe
1        Amy
2    Amybeth
Name: Name, dtype: object

Notice they are different

Series: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.replace.html

DataFrame: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.replace.html

answered Jan 11, 2018 at 19:55

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

MaxU - stand with Ukraine · Accepted Answer · 2018-01-11 21:03:01Z

Explanation:

When you apply it like this - you are working with strings, not with Pandas Series:

In [42]: df['Name'].apply(lambda x: print(type(x)))
<class 'str'>  # <---- NOTE
<class 'str'>  # <---- NOTE
<class 'str'>  # <---- NOTE
Out[42]:
0    None
1    None
2    None
Name: Name, dtype: object

It's the same as:

In [44]: 'Marybeth'.replace('Mary','Amy')
Out[44]: 'Amybeth'

Solution:

Use Series.replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad', axis=None) properly (without Series.apply()) - per default (regex=False) it will replace whole strings - as you expect it to work:

In [39]: df.Name.replace('Mary','Amy')
Out[39]:
0         Joe
1         Amy
2    Marybeth
Name: Name, dtype: object

you can explicitly specify regex=True, this will replace substrings:

In [40]: df.Name.replace('Mary','Amy', regex=True)
Out[40]:
0        Joe
1        Amy
2    Amybeth
Name: Name, dtype: object

NOTE: Series.str.replace(pat, repl, n=-1, case=None, flags=0) doesn't have regex parameter - it's always treats pat and repl as RegEx's:

In [41]: df.Name.str.replace('Mary','Amy')
Out[41]:
0        Joe
1        Amy
2    Amybeth
Name: Name, dtype: object

Alexander · Accepted Answer · 2018-01-11 19:56:13Z

2

You can use also loc to locate instances where the name exactly matches, and then set to the new name.

df.loc[df['Name'] == 'Mary', 'Name'] = "Amy"

answered Jan 11, 2018 at 19:56

Alexander

111k32 gold badges212 silver badges208 bronze badges

Collectives™ on Stack Overflow

Python Replace Whole Values in Dataframe String and Not Substrings

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related