1

I am trying to replace multiple string values in a column and I understand that I can use the replace() to do it one by one. Given I need to replace more than 10 string values, I am just wondering if there's a faster way to replace a number of string values to the same value.

df = pd.DataFrame({'a':["US", "Japan", "UK", "China", "Peru", "Germany"]})
df.replace({'a' : { 'Japan' : 'Germany', 'UK' : 'Germany', 'China' : 'Germany' }})

Expected output:

         a
0       US
1  Germany
2  Germany
3  Germany
4     Peru
5  Germany
3
  • Try df.replace('Japan|UK|China', 'Germany', regex=True). The df.replace() can handel regualr expressions, there you can combine multiple strings/groups. Commented Oct 22, 2021 at 11:48
  • How many different string values do you have in your column? Commented Oct 22, 2021 at 11:48
  • 15. @DaniMesejo Commented Oct 22, 2021 at 11:49

2 Answers 2

4

Use numpy.where with Series.isin:

#60k rows
df = pd.DataFrame({'a':["US", "Japan", "UK", "China", "Peru", "Germany"] * 10000})

In [161]: %timeit df['a'] = df.a.map({ 'Japan' : 'Germany', 'UK' : 'Germany', 'China' : 'Germany' }).fillna(df.a)
12.4 ms ± 501 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [162]: %timeit df['a'] = np.where(df.a.isin(['Japan','UK','China']), 'Germany', df.a)
4.27 ms ± 379 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)   

#assignment raise error in test
In [1632]: %timeit df.replace({'a' : { 'Japan' : 'Germany', 'UK' : 'Germany', 'China' : 'Germany' }})
7.85 ms ± 462 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Slowier solution:

In [157]: %timeit df.replace('Japan|UK|China', 'Germany', regex=True)
218 ms ± 842 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Sign up to request clarification or add additional context in comments.

3 Comments

I think a more thorough test should include the reassignment, which should be as a Series, either as a direct assignment or with assign method.
@sammywemmy - good idea.
You always have some nice solutions!
1

Use:

df = df.replace('Japan|UK|China', 'Germany', regex=True)

2 Comments

added to my answer.
I see your point. It is a slow way to do it. Thank you.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.