I have two dataframes df1 and df2.
d = d = {'ID': [31,42,63,44,45,26],
'lat': [64,64,64,64,64,64],
'lon': [152,152,152,152,152,152],
'other1': [12,13,14,15,16,17],
'other2': [21,22,23,24,25,26]}
df1 = pd.DataFrame(data=d)
d2 ={'ID': [27,48,31,45,49,10],
'LAT': [63,63,63,63,63,63],
'LON': [153,153,153,153,153,153]}
df2 = pd.DataFrame(data=d2)
df1 has incorrect values for columns lat and lon, but has correct data in the other columns that I need to keep track of. df2 has correct LAT and LON values but only has a few common IDs with df1. There are two things I would like to accomplish. First, I want to split df1 into two dataframes: df3 which has IDs that are present in df2; and df4 which has everything else. I can get df3 with:
df3=pd.DataFrame()
for i in reduce(np.intersect1d, [df1.ID, df2.ID]):
df3=df3.append(df1.loc[df1.ID==i])
but how do I get df4 to be the remaining data?
Second, I want to replace the lat and lon values in df3 with the correct data fromdf2.
I figure there is a slick python way to do something like:
for j in range(len(df3)):
for k in range(len(df2)):
if df3.ID[j] == df2.ID[k]:
df3.lat[j] = df2.LAT[k]
df3.lon[j] = df2.LON[k]
But I can't even get the above nested loop working correctly. I don't want to spend a lot of time getting it working if there is a better way to accomplish this in python.