0

I have two dataframes df1 and df2 and I want to merge them.

Dataframe df1 is as follows:

   IDs          Value1      Value2       
   AB              1          3
   AB              1          1
   AB              2          4           
   BC              2          2
   BC              5          0         
   BG              1          1         
   RF              2          2

and dataframe df2 is as follows:

   IDs          Issue     
   AB              AA
   AB              AAA
   AB              BA
   BC              CC
   BC              CA    
   BG              A        
   RF              D

and the desired output is df3:

   IDs          Value1      Value2        Issue     
   AB              1          3             AA
   AB              1          1             AAA
   AB              2          4             BA
   BC              2          2             CC
   BC              5          0             CA
   BG              1          1             A
   RF              2          2             D

Currently, the following:

df3 = pd.merge(df1,df2,left_on='IDs',right_on='IDs',how='inner')
df3 = pd.merge(df1,df2,left_on='IDs',right_on='IDs',how='left')
df3 = pd.merge(df1,df2,left_on='IDs',right_on='IDs',how='outer')

do not work, since they produce a result similar to the following:

   IDs          Value1      Value2        Issue     
   AB              1          3             AA
   AB              1          1             AA
   AB              2          4             AA
   BC              2          2             CC
   BC              5          0             CC
   BG              1          1             A
   RF              2          2             D

meaning that they duplicate the first value of the Issue field from df2.

2 Answers 2

4

Use cumcount for counter column in both DataFrames and add this column to parameter on in merge:

df1['g'] = df1.groupby('IDs').cumcount()
df2['g'] = df2.groupby('IDs').cumcount()

df3 = pd.merge(df1,df2,on=['IDs', 'g']).drop('g', axis=1)
print (df3)
  IDs  Value1  Value2 Issue
0  AB       1       3    AA
1  AB       1       1   AAA
2  AB       2       4    BA
3  BC       2       2    CC
4  BC       5       0    CA
5  BG       1       1     A
6  RF       2       2     D

Details:

print (df1)
  IDs  Value1  Value2  g
0  AB       1       3  0
1  AB       1       1  1
2  AB       2       4  2
3  BC       2       2  0
4  BC       5       0  1
5  BG       1       1  0
6  RF       2       2  0

print (df2)
  IDs Issue  g
0  AB    AA  0
1  AB   AAA  1
2  AB    BA  2
3  BC    CC  0
4  BC    CA  1
5  BG     A  0
6  RF     D  0
Sign up to request clarification or add additional context in comments.

9 Comments

This solution does not seem to work. I keep getting exactly the same issue described in my question above.
@user37143 - Really interesting, for me it working very nice.
@user37143 - Added output of columns g - for duplicated values is incrementing integers - can you check it?
@jezrael yes that's my output as well, but the join still messes things up, returning the first occurence of "Issue" duplicated
@user37143 - It should working, maybe possible problem some whitespaces in ids column or not same types. Thank you.
|
2

You can use pd.concat to literally join by the index of the dataframe. This means both of your dataframes have to be preordered and you simply "pasting" one dataframe next to the other.

pd.concat([df1, df2[['Issue']], axis=1)

Output:

  IDs  Value1  Value2 Issue
0  AB       1       3    AA
1  AB       1       1   AAA
2  AB       2       4    BA
3  BC       2       2    CC
4  BC       5       0    CA
5  BG       1       1     A
6  RF       2       2     D

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.