0

I have 2 dataframes like following:

df1

       id  salary
    0  1   1000
    1  2   2000

df2

       id  txn  age  gender
    0  1   6     23   M
    1  1   4     23   M
    2  2   10    31   F
    3  2   5     31   F
    4  2   8     31   F

I want to join the dataframes as following:

df3

       id  salary age  gender
    0  1   1000    23   M
    1  2   2000    31   F

I am using the following code but getting a total of 5 rows. However, I want only 2 rows like above dataframe

d3 = pd.merge(d1, d2, on='id', how='left')

What is the correct way to join the dataframes without getting duplicates?

1
  • You receive 5 rowes because id 1 from df1 --> sees 2 rows with id 1 in df2 and id 2 from df2 --> sees 3 times id 2 in df2 thus it will take all of them -- if you remove the column txn and drop the duplicate rows, it will work Commented Jan 7, 2021 at 19:36

1 Answer 1

2

Try:

df3 = df1.merge(df2.drop_duplicates('id')[['id','age','gender']],
                on='id', how='left')

Output:

   id  salary  age gender
0   1    1000   23      M
1   2    2000   31      F
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.