Problem: I have 2 dataframes df1 and df2. My goal is to modify df1 by replacing some of its values if found within df2.
import pandas as pd
# dataframe 1
data = {'A':[90,20,30,25,50,60],
'B':['qq','ee','rr','tt','ii','oo'],
'C':['XX','VV','BB','NN','KK','JJ']}
df1 = pd.DataFrame(data)
# dataframe 2
convert_table = {'X': ['dd','ee','ff','gg','hh','ii','ll','mm','nn','oo','pp','qq','rr','ss','tt','uu'],
'Y': ['DD','VV','FF','GG','HH','KK','LL','MM','NN','JJ','PP','XX','BB','SS','NN','LL'],
'Z': [5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61]}
df2 = pd.DataFrame(convert_table)
# search values of df1 inside of df2 and replace values
for idx1,row1 in df1.iterrows():
for idx2, row2 in df2.iterrows():
if row1['B']==row2['X'] and row1['C']==row2['Y']:
df1.replace(to_replace=row1['B'],value=row2['Z'],inplace=True)
As you can see I have 2 for loops and I check if the generic row of df1 (row1) is found inside of df2. If this condition is met, then I replace the value contained in row1['B'] with the one contained in row2['Z']
Therefore the results that I get is (exactly what I would like to have as a result):
In [120]: df1
Out[120]:
A B C
0 90 43 XX
1 20 7 VV
2 30 47 BB
3 25 59 NN
4 50 19 KK
5 60 37 JJ
Notice how column B has changed.
Question: could you suggest me a better way to write my code? I would like to make it as fast as possible maybe by using the built-in functions offered by Pandas or Python.
Note: the data contained into the dataframes is just for demonstration purposes.