import pandas as pd
df = pd.DataFrame([[1,2, 3, 'www', 'abc'],[4,5,6, 'ppp', 'def'], [6,7,8, 'qqq', 'ggg'], [11,22,33, 'fff', 'mmm']], columns=['A', 'B', 'C', 'D', 'E'])
d = {'www': 'www_replaced', 'def': 'def_replaced', 'fff': 'fff_replaced' }
df.replace(d, value=None, inplace=True)
As a result, dataframe is updated accordingly:
>>> df
A B C D E
0 1 2 3 www_replaced abc
1 4 5 6 ppp def_replaced
2 6 7 8 qqq ggg
3 11 22 33 fff_replaced mmm
>>>
However, I'd like to use map() function of pandas on both columns D and E for two reasons:
- I read that in general
mapis faster thenreplace - I can do something like this:
df[column] = df[column].map(d).fillna('Unknown')
I could run it twice, e.g. :
df['D'] = df['D'].map(d).fillna('Unknown')
df['E'] = df['E'].map(d).fillna('Unknown')
But is there a way to do change values in multiple columns with map in one command?
mapis only a series function. For speed, It depends on the size of your data, map is roughly 4 times faster for 30k rows and it is unclear what your best usecase should be. If you want tomap, use a for loop on the columnsdf[list_of_cols].stack().map(d).fillna('Unknown').unstack()will barely outperform the simple loop over columns for smaller DataFrames (<3000 rows).apply. Don't think performance wise it will be much worse/better:df[['D', 'E']] = df[['D', 'E']].apply(lambda x: x.map(d)).fillna('Unknown'). Although I think for bigger datasets (500k >) the solution of ALollz will do better.