Replace multiple substrings in a Pandas series with a value

Question

All,

To replace one string in one particular column I have done this and it worked fine:

dataUS['sec_type'].str.strip().str.replace("LOCAL","CORP")

I would like now to replace multiple strings with one string say replace ["LOCAL", "FOREIGN", "HELLO"] with "CORP"

How can make it work? the code below didn't work

dataUS['sec_type'].str.strip().str.replace(["LOCAL", "FOREIGN", "HELLO"], "CORP")

Henrique Mendonça · Accepted Answer · 2024-06-05 06:06:19Z

44

You can perform this task by forming a |-separated string. This works because pd.Series.str.replace accepts regex:

Replace occurrences of pattern/regex in the Series/Index with some other string. Equivalent to str.replace() or re.sub().

This avoids the need to create a dictionary.

import pandas as pd

df = pd.DataFrame({'A': ['LOCAL TEST', 'TEST FOREIGN', 'ANOTHER HELLO', 'NOTHING']})

pattern = '|'.join(['LOCAL', 'FOREIGN', 'HELLO'])

df['A'] = df['A'].str.replace(pattern, 'CORP', regex=True)

#               A
# 0     CORP TEST
# 1     TEST CORP
# 2  ANOTHER CORP
# 3       NOTHING

edited Jun 5, 2024 at 6:06

Henrique Mendonça

6241 gold badge7 silver badges21 bronze badges

answered Mar 21, 2018 at 17:38

jpp

166k37 gold badges301 silver badges362 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

SBad Over a year ago

Your solutions worked best for me. Thank you. I also liked the solution proposed (but was deleted i think) dataUS.replace({"sec_type": { 'POOL' : "OTHERS", 'ABS' : "OTHERS"}})

jpp Over a year ago

Would the downvoter care to suggest a problem with this method?

Jeff Ellen Over a year ago

This did not work for me, is it because I'm using Python 2?. Also you didn't explain why it works (which would be a better answer) but I'm inferring that this is a regex format? I am not familiar with Python 3, but I don't see that documented here: docs.python.org/2/library/string.html#string.replace

jpp Over a year ago

This works for me (python 3.6 / pandas 0.19.2), maybe you are using an older version of pandas and/or python. OP did accept it, though..

Jeff Ellen Over a year ago

Also, I downvoted because I think using the built in pandas suggested by Rakesh is vastly superior (even to my own answer)

|

Laurens Koppenol · Accepted Answer · 2019-07-23 09:24:52Z

17

The answer of @Rakesh is very neat but does not allow for substrings. With a small change however, it does.

Use a replacement dictionary because it makes it much more generic
Add the keyword argument regex=True to Series.replace() (not Series.str.replace) This does two things actually: It changes your replacement to regex replacement, which is much more powerful but you will have to escape special characters. Beware for that. Secondly it will make the replace work on substrings instead of the entire string. Which is really cool!

replacement = {
    "LOCAL": "CORP",
    "FOREIGN": "CORP",
    "HELLO": "CORP"
}

dataUS['sec_type'].replace(replacement, regex=True)

Full code example

dataUS = pd.DataFrame({'sec_type': ['LOCAL', 'Sample text LOCAL', 'Sample text LOCAL sample FOREIGN']})

replacement = {
    "LOCAL": "CORP",
    "FOREIGN": "CORP",
    "HELLO": "CORP"
}

dataUS['sec_type'].replace(replacement, regex=True)

Output

0                            CORP
1                            CORP
2                Sample text CORP
3    Sample text CORP sample CORP
Name: sec_type, dtype: object

answered Jul 23, 2019 at 9:24

Laurens Koppenol

3,0912 gold badges23 silver badges35 bronze badges

1 Comment

Naresh Kumar Over a year ago

This solution is comparatively slow than using multiple replace calls on a column one by one.

BENY · Accepted Answer · 2018-03-21 18:31:01Z

14

replace can accept dict , os we just create a dict for those values need to be replaced

dataUS['sec_type'].str.strip().replace(dict(zip(["LOCAL", "FOREIGN", "HELLO"], ["CORP"]*3)),regex=True)

Info of the dict

dict(zip(["LOCAL", "FOREIGN", "HELLO"], ["CORP"]*3))
Out[585]: {'FOREIGN': 'CORP', 'HELLO': 'CORP', 'LOCAL': 'CORP'}

The reason why you receive the error ,

str.replace is different from replace

edited Mar 21, 2018 at 18:31

answered Mar 21, 2018 at 17:32

BENY

324k22 gold badges176 silver badges250 bronze badges

4 Comments

cs95 Over a year ago

Try instead dict.fromkeys(["LOCAL", "FOREIGN", "HELLO"], 'CORP')

SBad Over a year ago

I have tried both suggested solutions and get the error TypeError: replace() takes at least 3 arguments (2 given)

Jeff Ellen Over a year ago

Yes, me too... I have a different solution

BENY Over a year ago

@cᴏʟᴅsᴘᴇᴇᴅ hah like this one , even with your improvement ,still

Cam · Accepted Answer · 2022-01-26 10:30:12Z

5

@JJP answer is a good one if you have a long list. But if you have just two or three then you can simply use the '|' within the pattern. Make sure to add regex=True parameter.

Clearly .str.strip() is not a requirement but is good practise.

import pandas as pd

df = pd.DataFrame({'A': ['LOCAL TEST', 'TEST FOREIGN', 'ANOTHER HELLO', 'NOTHING']})

df['A'] = df['A'].str.strip().str.replace("LOCAL|FOREIGN|HELLO", "CORP", regex=True)

output

    A
0   CORP TEST
1   TEST CORP
2   ANOTHER CORP
3   NOTHING

edited Jan 26, 2022 at 10:30

answered Sep 26, 2021 at 11:46

Cam

1,8651 gold badge23 silver badges34 bronze badges

3 Comments

Bowen Liu Over a year ago

Thanks for sharing, but when I tried using this on the column names df.columns.str.replace(' '|'/'|'-','_', regex = True) I got an error "TypeError: unsupported operand type(s) for |: 'str' and 'str'". What did I do wrong here?

Cam Over a year ago

try df.columns.str.strip().str.replace('[\s\/\-]','_', regex = True)

Bowen Liu Over a year ago

Thanks a lot Cam! It works. Apparently it's due to my poor regex knowledge. I suppose \s captures whitespaces. What was wrong with my method tho?

Nuclear241 · Accepted Answer · 2021-10-06 23:15:27Z

2

Function to replace multiple values in pandas Series:

def replace_values(series, to_replace, value):
    for i in to_replace:
        series = series.str.replace(i, value)
    return series

Hope this helps someone

edited Oct 6, 2021 at 23:15

Nuclear241

5709 silver badges23 bronze badges

answered May 14, 2019 at 13:18

Anthony R

2,9791 gold badge14 silver badges12 bronze badges

Comments

Rakesh · Accepted Answer · 2018-03-21 17:37:47Z

0

Try:

dataUS.replace({"sec_type": { 'LOCAL' : "CORP", 'FOREIGN' : "CORP"}})

answered Mar 21, 2018 at 17:37

Rakesh

82.9k17 gold badges85 silver badges122 bronze badges

4 Comments

Jeff Ellen Over a year ago

This is beter than my solution because it uses pandas native method, which I overlooked when focusing on what I knew was the problem in str.replace()

jpp Over a year ago

This does not work for substrings.. You need pd.Series.str.replace, not pd.Series.replace.

Rakesh Over a year ago

@jpp sorry I don’t understand

jpp Over a year ago

Look up difference between pd.Series.replace [requires exact string match] and pd.Series.str.replace [replaces substrings]. They are different methods and do different things.

Collectives™ on Stack Overflow

Replace multiple substrings in a Pandas series with a value

6 Answers 6

6 Comments

1 Comment

4 Comments

3 Comments

Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

6 Comments

1 Comment

4 Comments

3 Comments

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related