8

Is there a case insensitive version for pandas.DataFrame.replace? https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.DataFrame.replace.html

I need to replace string values in a column subject to a case-insensitive condition of the form "where label == a or label == b or label == c".

2
  • 1
    Can you add data sample and expected output? Commented Dec 7, 2017 at 9:42
  • 1
    Say column 1 has values ['test', 'Test', 'cat', 'CAT', 'dog', 'Cat'] and I want to replace all occurrences of 'test' and 'cat' with 'baby', irrespective of case. Commented Dec 7, 2017 at 9:44

2 Answers 2

24

The issue with some of the other answers is that they don't work with all Dataframes, only with Series, or Dataframes that can be implicitly converted to a Series. I understand this is because the .str construct exists in the Series class, but not in the Dataframe class.

To work with Dataframes, you can make your regular expression case insensitive with the (?i) extension. I don't believe this is available in all flavors of RegEx but it works with Pandas.

d = {'a':['test', 'Test', 'cat'], 'b':['CAT', 'dog', 'Cat']}
df = pd.DataFrame(data=d)

    a       b
0   test    CAT
1   Test    dog
2   cat     Cat

Then use replace as you normally would but with the (?i) extension:

df.replace('(?i)cat', 'MONKEY', regex=True)

    a       b
0   test    MONKEY
1   Test    dog
2   MONKEY  MONKEY
Sign up to request clarification or add additional context in comments.

Comments

6

I think need convert to lower and then replace by condition with isin:

d = {'a':['test', 'Test', 'cat', 'CAT', 'dog', 'Cat']}
df = pd.DataFrame(data=d)

m = df['a'].str.lower().isin(['cat','test'])
df.loc[m, 'a'] = 'baby' 
print (df)
      a
0  baby
1  baby
2  baby
3  baby
4   dog
5  baby

Another solution:

df['b'] = df['a'].str.replace('test', 'baby', flags=re.I)
print (df)
      a     b
0  test  baby
1  Test  baby
2   cat   cat
3   CAT   CAT
4   dog   dog
5   Cat   Cat

2 Comments

I like the latter solution, but that doesn't support replacement of multiple strings in one shot. I have to run it multiple times, for 'test' and 'cat' separately. I'd like to do it in one shot.
I think it is not implemented in Series.replace, DataFrame.replace

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.