I have two large dataframes df1 --> 100K rows and df2 --> 600K rows and they look like the following
# df1
name price brand model
0 CANON CAMERA 20 FS36dINFS MEGAPIXEL 9900.0 CANON FS36dINFS
1 SONY HD CAMERA 25 MEGAPIXEL 8900.0 SONY
2 LG 55" 4K UHD LED Smart TV 55UJ635V 5890.0 LG 55UJ635V
3 Sony 65" LED Smart TV KD-65XD8505BAE 4790.0 SONY KD-65XD8505BAE
4 LG 49" 4K UHD LED Smart TV 49UJ651V 4390.0 LG 49UJ651V
#df2
name store price
0 LG 49" 4K UHD LED Smart TV 49UJ651V storeA 4790.0
1 SONY HD CAMERA 25 MEGAPIXEL storeA 12.90
2 Samsung 32" LED Smart TV UE-32J4505XXE storeB 1.30
I want to match if brand and other features from df1 are in df2 and if they exist then I do something. Currently I am using a naive approach of iterating through both the dataframes like the following
datalist = []
for idx1, row1 in df1.iterrow():
for idx2, row2 in df2.iterrows():
if(row1['brand'] in row2['name'] and row1['model'] in row2['name']):
datalist.append([row1['model'], row1['brand'], row1['name'], row1['price'], row2['name'],row2['price'], row2['store']])
But this is taking a lot of time because both dataframes are big. I studied that sets are faster but here, the way I am using the dataframes using iterrows I can't convert to set because then I'll lose the positions. Is there any faster to do that ?