5

I've got 2 dataframes with identical columns:

df1 = pd.DataFrame([['Abe','1','True'],['Ben','2','True'],['Charlie','3','True']], columns=['Name','Number','Other'])
df2 = pd.DataFrame([['Derek','4','False'],['Ben','5','False'],['Erik','6','False']], columns=['Name','Number','Other'])

which give:

     Name Number Other
0      Abe      1  True
1      Ben      2  True
2  Charlie      3  True

and

    Name Number  Other
0  Derek      4  False
1    Ben      5  False
2   Erik      6  False

I want an output dataframe that is an intersection of the two based on "Name":

output_df = 
        Name Number  Other
    0    Ben      2  True
    1    Ben      5  False

I've tried a basic pandas merge but the return is non-desirable:

pd.merge(df1,df2,how='inner',on='Name') = 
 Name Number_x Other_x Number_y Other_y
0  Ben        2    True        5   False

These dataframes are quite large so I'd prefer to use some pandas magic to keep things quick.

1 Answer 1

9

You can use concat and then filter by isin with numpy.intersect1d using boolean indexing:

val = np.intersect1d(df1.Name, df2.Name)
print (val)
['Ben']

df = pd.concat([df1,df2], ignore_index=True)
print (df[df.Name.isin(val)])
  Name Number  Other
1  Ben      2   True
4  Ben      5  False

Another possible solution for val is intersection of sets:

val = set(df1.Name).intersection(set(df2.Name))
print (val)
{'Ben'}

Then is possible reset index to monotonic:

df = pd.concat([df1,df2])
print (df[df.Name.isin(val)].reset_index(drop=True))
  Name Number  Other
0  Ben      2   True
1  Ben      5  False
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.