How to map one dataframe to another (python pandas)?

Question

Given these two dataframes, how do I get the intended output dataframe? The long way would be to loop through the rows of the dataframe with iloc and then use the map function after converting df2 to a dict to map the x and y to their score.

This seems tedious and would take long to run on a large dataframe. I'm hoping there's a cleaner solution.

df1:

ID    A    B    C
1     x    x    y
2     y    x    y
3     x    y    y

df2:

ID    score_x    score_y
1          20         30
2          15         17
3          18         22

output:

ID    A     B     C
1     20    20    30
2     17    15    17
3     18    22    22

Note: the dataframes would have many columns and there would be more than just x and y as categories (possibly in the region of 20 categories).

Thanks!

Space Impact · Accepted Answer · 2019-07-10 11:42:34Z

8

Use DataFrame.apply along columns with Series.map:

df1.set_index('ID', inplace=True)
df2.set_index('ID', inplace=True)
df2.columns = df2.columns.str.split('_').str[-1]

df1 = df1.apply(lambda x: x.map(df2.loc[x.name]), axis=1).reset_index()

print(df1)
   ID   A   B   C
0   1  20  20  30
1   2  17  15  17
2   3  18  22  22

edited Jul 10, 2019 at 11:42

answered Jul 10, 2019 at 11:39

Space Impact

13.3k26 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Stef · Accepted Answer · 2019-07-10 12:00:41Z

4

Using mask:

df1.set_index('ID', inplace=True)
df2.set_index('ID', inplace=True)

df1.mask(df1=='x',df2['score_x'],axis=0).mask(df1=='y',df2['score_y'],axis=0)

Result:

     A   B   C
ID            
1   20  20  30
2   17  15  17
3   18  22  22

If there are many columns and they are all named in the same way, you can use something like that:

for e in df2.columns.str.split('_').str[-1]:
     df1.mask(df1==e, df2['score_'+e], axis=0, inplace=True)

edited Jul 10, 2019 at 12:00

answered Jul 10, 2019 at 11:42

Stef

30.9k3 gold badges34 silver badges60 bronze badges

2 Comments

Stef Over a year ago

@jezrael: see my addition to the answer, altough I admit it's not very elegant.

Stef Over a year ago

@jezrael Using mask for substituting 20 categories in a 1000 x 3 dataframe is about 10 times faster than using apply

Aditya Santoso · Accepted Answer · 2019-07-10 11:41:55Z

0

There might be a more elegant way for this, but assuming you can enumerate through the categories and columns:

import numpy as np

df3 = df1.set_index('ID').join(df2.set_index('ID'), on='ID')
for col in ['A','B','C']:
     for type in ['x','y']:
         df3[col] = np.where(df3[col] == type, df3['score_'+type], df3[col])

>>> df3
     A   B   C  score_x  score_y
ID
1   20  20  30       20       30
2   17  15  17       15       17
3   18  22  22       18       22

answered Jul 10, 2019 at 11:41

Aditya Santoso

1,0817 silver badges19 bronze badges

Collectives™ on Stack Overflow

How to map one dataframe to another (python pandas)?

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related