I have a Dataframe df1 (original table) with multiple columns.
I have a filtered DataFrame df2 with columns date, agent_id, gps1, gps2 only 4 columns.
In df1, I have date, agent_id and final_gps along with other columns.
I want to filter all the data from df1 which exists in df2, I want to compare based on.
Df1.date == df2.date & df1.agent_id == df2.agent_id & (df1.final_gps == df2.gps1 or df1.final_gps == df2.gps2)
df2 sample
date agent_id gps1 gps2
14-02-2020 12abc (1,2) (7,6)
14-02-2020 12abc (3,4) (7,6)
14-02-2020 33bcd (6,7) (8,9)
20-02-2020 44hgf (1,6) (3,7)
20-02-2020 12abc (3,5) (3,1)
20-02-2020 33bcd (3,4) (3,6)
21-02-2020 12abc (4,5) (5,4)
df1 sample
date agent_id final_gps ….
10-02-2020 12abc (1,2) …
10-02-2020 33bcd (8,9) …
14-02-2020 12abc (1,2) …
14-02-2020 12abc (7,6) …
14-02-2020 12abc (3,4) …
14-02-2020 33bcd (6,7) …
14-02-2020 33bcd (8,9) …
14-02-2020 33bcd (1,1) …
14-02-2020 33bcd (2,2) …
18-02-2020 12abc (1,2) …
19-02-2020 44hgf (6,7) …
20-02-2020 12abc (3,5) …
20-02-2020 12abc (3,1) …
20-02-2020 44hgf (1,6) …
20-02-2020 44hgf (3,7) …
required output:-
date agent_id final_gps ….
14-02-2020 12abc (1,2) …
14-02-2020 12abc (7,6) …
14-02-2020 12abc (3,4) …
14-02-2020 33bcd (6,7) …
14-02-2020 33bcd (8,9) …
20-02-2020 12abc (3,5) …
20-02-2020 12abc (3,1) …
20-02-2020 44hgf (1,6) …
20-02-2020 44hgf (3,7) …
I tried this but it is giving me all the matching records which exists in df2, but I want strictly data only for those agent_id on that particular date and particular gps matching condition from df1.
df = df1[df1['date'].isin(df2['date']) &
df1['agent_id'].isin(df2['agent_id']) &
(df1['final_gps'].isin(df2['gps1']) | df1['final_gps'].isin(df2['gps2']))]