27

All,

To replace one string in one particular column I have done this and it worked fine:

dataUS['sec_type'].str.strip().str.replace("LOCAL","CORP")

I would like now to replace multiple strings with one string say replace ["LOCAL", "FOREIGN", "HELLO"] with "CORP"

How can make it work? the code below didn't work

dataUS['sec_type'].str.strip().str.replace(["LOCAL", "FOREIGN", "HELLO"], "CORP")

6 Answers 6

44

You can perform this task by forming a |-separated string. This works because pd.Series.str.replace accepts regex:

Replace occurrences of pattern/regex in the Series/Index with some other string. Equivalent to str.replace() or re.sub().

This avoids the need to create a dictionary.

import pandas as pd

df = pd.DataFrame({'A': ['LOCAL TEST', 'TEST FOREIGN', 'ANOTHER HELLO', 'NOTHING']})

pattern = '|'.join(['LOCAL', 'FOREIGN', 'HELLO'])

df['A'] = df['A'].str.replace(pattern, 'CORP', regex=True)

#               A
# 0     CORP TEST
# 1     TEST CORP
# 2  ANOTHER CORP
# 3       NOTHING
Sign up to request clarification or add additional context in comments.

6 Comments

Your solutions worked best for me. Thank you. I also liked the solution proposed (but was deleted i think) dataUS.replace({"sec_type": { 'POOL' : "OTHERS", 'ABS' : "OTHERS"}})
Would the downvoter care to suggest a problem with this method?
This did not work for me, is it because I'm using Python 2?. Also you didn't explain why it works (which would be a better answer) but I'm inferring that this is a regex format? I am not familiar with Python 3, but I don't see that documented here: docs.python.org/2/library/string.html#string.replace
This works for me (python 3.6 / pandas 0.19.2), maybe you are using an older version of pandas and/or python. OP did accept it, though..
Also, I downvoted because I think using the built in pandas suggested by Rakesh is vastly superior (even to my own answer)
|
17

The answer of @Rakesh is very neat but does not allow for substrings. With a small change however, it does.

  1. Use a replacement dictionary because it makes it much more generic
  2. Add the keyword argument regex=True to Series.replace() (not Series.str.replace) This does two things actually: It changes your replacement to regex replacement, which is much more powerful but you will have to escape special characters. Beware for that. Secondly it will make the replace work on substrings instead of the entire string. Which is really cool!
replacement = {
    "LOCAL": "CORP",
    "FOREIGN": "CORP",
    "HELLO": "CORP"
}

dataUS['sec_type'].replace(replacement, regex=True)

Full code example

dataUS = pd.DataFrame({'sec_type': ['LOCAL', 'Sample text LOCAL', 'Sample text LOCAL sample FOREIGN']})

replacement = {
    "LOCAL": "CORP",
    "FOREIGN": "CORP",
    "HELLO": "CORP"
}

dataUS['sec_type'].replace(replacement, regex=True)

Output

0                            CORP
1                            CORP
2                Sample text CORP
3    Sample text CORP sample CORP
Name: sec_type, dtype: object

1 Comment

This solution is comparatively slow than using multiple replace calls on a column one by one.
14

replace can accept dict , os we just create a dict for those values need to be replaced

dataUS['sec_type'].str.strip().replace(dict(zip(["LOCAL", "FOREIGN", "HELLO"], ["CORP"]*3)),regex=True)

Info of the dict

dict(zip(["LOCAL", "FOREIGN", "HELLO"], ["CORP"]*3))
Out[585]: {'FOREIGN': 'CORP', 'HELLO': 'CORP', 'LOCAL': 'CORP'}

The reason why you receive the error ,

str.replace is different from replace

4 Comments

Try instead dict.fromkeys(["LOCAL", "FOREIGN", "HELLO"], 'CORP')
I have tried both suggested solutions and get the error TypeError: replace() takes at least 3 arguments (2 given)
Yes, me too... I have a different solution
@cᴏʟᴅsᴘᴇᴇᴅ hah like this one , even with your improvement ,still
5

@JJP answer is a good one if you have a long list. But if you have just two or three then you can simply use the '|' within the pattern. Make sure to add regex=True parameter.

Clearly .str.strip() is not a requirement but is good practise.

import pandas as pd

df = pd.DataFrame({'A': ['LOCAL TEST', 'TEST FOREIGN', 'ANOTHER HELLO', 'NOTHING']})

df['A'] = df['A'].str.strip().str.replace("LOCAL|FOREIGN|HELLO", "CORP", regex=True)

output

    A
0   CORP TEST
1   TEST CORP
2   ANOTHER CORP
3   NOTHING

3 Comments

Thanks for sharing, but when I tried using this on the column names df.columns.str.replace(' '|'/'|'-','_', regex = True) I got an error "TypeError: unsupported operand type(s) for |: 'str' and 'str'". What did I do wrong here?
try df.columns.str.strip().str.replace('[\s\/\-]','_', regex = True)
Thanks a lot Cam! It works. Apparently it's due to my poor regex knowledge. I suppose \s captures whitespaces. What was wrong with my method tho?
2

Function to replace multiple values in pandas Series:

def replace_values(series, to_replace, value):
    for i in to_replace:
        series = series.str.replace(i, value)
    return series

Hope this helps someone

Comments

0

Try:

dataUS.replace({"sec_type": { 'LOCAL' : "CORP", 'FOREIGN' : "CORP"}})

4 Comments

This is beter than my solution because it uses pandas native method, which I overlooked when focusing on what I knew was the problem in str.replace()
This does not work for substrings.. You need pd.Series.str.replace, not pd.Series.replace.
@jpp sorry I don’t understand
Look up difference between pd.Series.replace [requires exact string match] and pd.Series.str.replace [replaces substrings]. They are different methods and do different things.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.