I have two dataframes df1 and df2, I need to check if the values in df1 column x1 and column x2 exist in df2 column x. If the value doesn't exists, then add it to df2 column x and NaN to df2 column y.
The following is the what I have, it works but takes too long for large datasets and I feel it could be improved and simplified using Pandas methods.
df1 = pd.DataFrame({'x1':['a', 'b', 'e'], 'x2':['c', 'd', 'b']})
df2 = pd.DataFrame({'x':['d', 'e', 'f'], 'y':['a1', 'b2', 'c3']})
diff = set([*df1[~df1['x1'].isin(df2['x'])]['x1'], *df1[~df1['x2'].isin(df2['x'])]['x2']])
for x in diff:
df2 = df2.append({"x":x}, ignore_index=True)
df1:
x1 x2
0 a c
1 b d
2 e b
df2:
x y
0 d a1
1 e b2
2 f c3
Results should be:
x y
0 d a1
1 e b2
2 f c3
3 c NaN
4 b NaN
5 a NaN