3

I'd like to combine two dataframes using their similar column 'A':

>>> df1
    A   B
0   I   1
1   I   2
2   II  3

>>> df2
    A   C
0   I   4
1   II  5
2   III 6

To do so I tried using:

merged = pd.merge(df1, df2, on='A', how='outer')

Which returned:

>>> merged
    A   B   C
0   I   1.0 4
1   I   2.0 4
2   II  3.0 5
3   III NaN 6

However, since df2 only contained one value for A == 'I', I do not want this value to be duplicated in the merged dataframe. Instead I would like the following output:

>>> merged
    A   B   C
0   I   1.0 4
1   I   2.0 NaN
2   II  3.0 5
3   III NaN 6

What is the best way to do this? I am new to python and still slightly confused with all the join/merge/concatenate/append operations.

2
  • How do you want it to know that the 4 value in C belonged with the 1 in B? Commented Nov 22, 2017 at 16:14
  • @flyingmeatball, I actually do not care about the relationship between columns B and C. I just want 4 to be inserted in the first row of A == 'I' where column C holds no value yet. Commented Nov 22, 2017 at 16:33

1 Answer 1

8

Let us create a new variable g, by cumcount

df1['g']=df1.groupby('A').cumcount()
df2['g']=df2.groupby('A').cumcount()
df1.merge(df2,how='outer').drop('g',1)
Out[62]: 
     A    B    C
0    I  1.0  4.0
1    I  2.0  NaN
2   II  3.0  5.0
3  III  NaN  6.0
Sign up to request clarification or add additional context in comments.

7 Comments

Thank you, that did the job!
@Martijn Yw~ :-)
@Bharath aha:-)
@Bharath I learn it from piR , also he speed it up :-) .....stackoverflow.com/questions/40575486/…
@Bharath, Sorry! I missed that topic.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.