Mapping values to dataframe from a different dataframe in Pandas

Question

Problem: I have 2 dataframes df1 and df2. My goal is to modify df1 by replacing some of its values if found within df2.

import pandas as pd

# dataframe 1
data = {'A':[90,20,30,25,50,60],
        'B':['qq','ee','rr','tt','ii','oo'],
        'C':['XX','VV','BB','NN','KK','JJ']}
df1 = pd.DataFrame(data)

# dataframe 2
convert_table = {'X': ['dd','ee','ff','gg','hh','ii','ll','mm','nn','oo','pp','qq','rr','ss','tt','uu'], 
                 'Y': ['DD','VV','FF','GG','HH','KK','LL','MM','NN','JJ','PP','XX','BB','SS','NN','LL'], 
                 'Z': [5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61]}
df2 = pd.DataFrame(convert_table)

# search values of df1 inside of df2 and replace values
for idx1,row1 in df1.iterrows():
    for idx2, row2 in df2.iterrows():
        if row1['B']==row2['X'] and row1['C']==row2['Y']:
            df1.replace(to_replace=row1['B'],value=row2['Z'],inplace=True)

As you can see I have 2 for loops and I check if the generic row of df1 (row1) is found inside of df2. If this condition is met, then I replace the value contained in row1['B'] with the one contained in row2['Z']

Therefore the results that I get is (exactly what I would like to have as a result):

In [120]: df1
Out[120]: 
    A   B   C
0  90  43  XX
1  20   7  VV
2  30  47  BB
3  25  59  NN
4  50  19  KK
5  60  37  JJ

Notice how column B has changed.

Question: could you suggest me a better way to write my code? I would like to make it as fast as possible maybe by using the built-in functions offered by Pandas or Python.

Note: the data contained into the dataframes is just for demonstration purposes.

Julien Marrec · Accepted Answer · 2016-11-22 14:22:52Z

3

Use merge on two columns:

df1.merge(df2, left_on=['B','C'], right_on=['X','Y'], how='left')

The how='left' is critical here. Read Brief primer on merge methods (relational algebra) if you don't understand why.

I'll modify your example to create one where there's an entry in df1 that doesn't exist in df2, which is ('ii','KK')

In [1]:
# dataframe 2
convert_table = {'X': ['dd','ee','ff','gg','hh','ll','mm','nn','oo','pp','qq','rr','ss','tt','uu'], 
                 'Y': ['DD','VV','FF','GG','HH','LL','MM','NN','JJ','PP','XX','BB','SS','NN','LL'], 
                 'Z': [5,7,11,13,17,19,23,29,37,41,43,47,53,59,61]}
df2 = pd.DataFrame(convert_table)



In [2]: merged = df1.merge(df2, left_on=['B','C'], right_on=['X','Y'], how='left')
        merged
Out[2]: 
    A   B   C    X    Y     Z
0  90  qq  XX   qq   XX  43.0
1  20  ee  VV   ee   VV   7.0
2  30  rr  BB   rr   BB  47.0
3  25  tt  NN   tt   NN  59.0
4  50  ii  KK  NaN  NaN   NaN
5  60  oo  JJ   oo   JJ  37.0

Now to retrieve the final dataframe:

In [3]:
merged.ix[merged.Z.notnull(),'B'] = merged.ix[merged.Z.notnull(),'Z']
merged = merged[['A','B','C']]
merged

Out[3]:
    A   B   C
0  90  43  XX
1  20   7  VV
2  30  47  BB
3  25  59  NN
4  50  ii  KK
5  60  37  JJ

edited Nov 22, 2016 at 14:22

answered Nov 22, 2016 at 14:16

Julien Marrec

11.9k5 gold badges51 silver badges67 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Federico Gentile Over a year ago

Is it possible to get an output that has the same number of columns as the one I got in my example?

Julien Marrec Over a year ago

I just did this at the same time you posted your comment :)

Collectives™ on Stack Overflow

Mapping values to dataframe from a different dataframe in Pandas

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related