How to join dataframe without getting duplicates?

Question

I have 2 dataframes like following:

df1

       id  salary
    0  1   1000
    1  2   2000

df2

       id  txn  age  gender
    0  1   6     23   M
    1  1   4     23   M
    2  2   10    31   F
    3  2   5     31   F
    4  2   8     31   F

I want to join the dataframes as following:

df3

       id  salary age  gender
    0  1   1000    23   M
    1  2   2000    31   F

I am using the following code but getting a total of 5 rows. However, I want only 2 rows like above dataframe

d3 = pd.merge(d1, d2, on='id', how='left')

What is the correct way to join the dataframes without getting duplicates?

You receive 5 rowes because id 1 from df1 --> sees 2 rows with id 1 in df2 and id 2 from df2 --> sees 3 times id 2 in df2 thus it will take all of them -- if you remove the column txn and drop the duplicate rows, it will work — Dieter
– Dieter, Commented Jan 7, 2021 at 19:36

Quang Hoang · Accepted Answer · 2021-01-07 19:33:05Z

2

Try:

df3 = df1.merge(df2.drop_duplicates('id')[['id','age','gender']],
                on='id', how='left')

Output:

   id  salary  age gender
0   1    1000   23      M
1   2    2000   31      F

answered Jan 7, 2021 at 19:33

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to join dataframe without getting duplicates?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related