python: replace values in multiple columns in pandas

Ask Question

Asked 5 years, 9 months ago

Modified 5 years, 9 months ago

Viewed 771 times

import pandas as pd

df = pd.DataFrame([[1,2, 3, 'www', 'abc'],[4,5,6, 'ppp', 'def'], [6,7,8, 'qqq', 'ggg'], [11,22,33, 'fff', 'mmm']], columns=['A', 'B', 'C', 'D', 'E'])

d = {'www': 'www_replaced', 'def': 'def_replaced', 'fff': 'fff_replaced' }
df.replace(d, value=None, inplace=True)

As a result, dataframe is updated accordingly:

>>> df
    A   B   C             D             E
0   1   2   3  www_replaced           abc
1   4   5   6           ppp  def_replaced
2   6   7   8           qqq           ggg
3  11  22  33  fff_replaced           mmm
>>>

However, I'd like to use map() function of pandas on both columns D and E for two reasons:

I read that in general map is faster then replace
I can do something like this: df[column] = df[column].map(d).fillna('Unknown')

I could run it twice, e.g. :

df['D'] = df['D'].map(d).fillna('Unknown')
df['E'] = df['E'].map(d).fillna('Unknown')

But is there a way to do change values in multiple columns with map in one command?

asked Feb 4, 2020 at 22:12

Mark

6,60412 gold badges79 silver badges157 bronze badges

1

No as map is only a series function. For speed, It depends on the size of your data, map is roughly 4 times faster for 30k rows and it is unclear what your best usecase should be. If you want to map, use a for loop on the columns

modesitt
– modesitt

2020-02-04 22:15:37 +00:00
Commented Feb 4, 2020 at 22:15
@modesitt, thanks for feedback. I will use the loop. My dataset is ~50-60K row

Mark
– Mark

2020-02-04 22:20:09 +00:00
Commented Feb 4, 2020 at 22:20
2

If you need to do this on many columns, then df[list_of_cols].stack().map(d).fillna('Unknown').unstack() will barely outperform the simple loop over columns for smaller DataFrames (<3000 rows).

ALollz
– ALollz

2020-02-04 22:25:45 +00:00
Commented Feb 4, 2020 at 22:25
1

If you are writing a loop, I think it's more elegant to use apply. Don't think performance wise it will be much worse/better: df[['D', 'E']] = df[['D', 'E']].apply(lambda x: x.map(d)).fillna('Unknown'). Although I think for bigger datasets (500k >) the solution of ALollz will do better.

Erfan
– Erfan

2020-02-04 22:44:20 +00:00
Commented Feb 4, 2020 at 22:44

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

python: replace values in multiple columns in pandas

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked