Combine Dataframes in Python with same and different Column Names

Question

Hello I have 2 dataframes I want to combine

dataframe 1 :

ID	A	B	C
row1	1	2	3
row2	4	5	6

dataframe 2:

ID	A	B	D
row1	6	7	8

and I want them to merge and replace values of the same row to the values on dataframe 2 like this:

ID	A	B	C	D
row1	6	7	3	8
row2	4	5	6	null

how do I do this? I tried merging and concatenation but it doesn't seem to work. Thank you

perhaps this one may help ? stackoverflow.com/a/43735493/3270433 — FerdousTheWebCoder
– FerdousTheWebCoder, Commented Jul 6, 2022 at 15:35
@PrakashDahal this is just a sample, I need to use the code for much larger data that is always changing that need to replace data on rows and columns that already exist and add columns that aren't on the original dataframe — woops
– woops, Commented Jul 6, 2022 at 15:36

Corralien · Accepted Answer · 2022-07-07 04:29:02Z

4

Another method to merge your 2 dataframes:

>>> pd.concat([df1, df2]).groupby('ID').last().reset_index()
     ID  A  B    C    D
0  row1  6  7  3.0  8.0
1  row2  4  5  6.0  NaN

Solution enhanced by @PierreD:

This assumes ID is not the index, however (if it is, then it is lost). If you reformulate as pd.concat([df1, df2]).groupby('ID').last(), then it works in both cases, and makes ID the index. You can of course then .reset_index() if that's not desired.

edited Jul 7, 2022 at 4:29

answered Jul 6, 2022 at 15:56

Corralien

121k8 gold badges44 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Pierre D Over a year ago

Nice, clean and going from first principles. This assumes ID is not the index, however (if it is, then it is lost). If you reformulate as pd.concat([df1, df2]).groupby('ID').last(), then it works in both cases, and makes ID the index. You can of course then .reset_index() if that's not desired.

Corralien Over a year ago

Thanks @PierreD for your comment. I updated my solution with your suggestion.

Pierre D · Accepted Answer · 2022-07-06 16:14:55Z

2

Assuming ID is the index in both DataFrames (if not, make it so): There is actually a function combine_first():

out = df2.combine_first(df1)
>>> out
      A  B  C    D
ID                
row1  6  7  3  8.0
row2  4  5  6  NaN

Notes:

why is column D of type float? Because of that NaN.
what if the rows are in different order, e.g. df1 has row2 first and then row1? Not a problem at all and the result is exactly the same as above (with rows sorted). Tested with pandas=1.4.2 and also pandas=1.3.4.

edited Jul 6, 2022 at 16:14

answered Jul 6, 2022 at 15:42

Pierre D

26.6k8 gold badges71 silver badges108 bronze badges

6 Comments

Prakash Dahal Over a year ago

This might not work if the rows are interchanged in 1st df?

Pierre D Over a year ago

@PrakashDahal: yes it works regardless of the respective ordering of rows in either df.

Prakash Dahal Over a year ago

@PierreD have you checked it in notebook? It did not work if the row1 and row2 placed are interchanged in df1

Pierre D Over a year ago

@PrakashDahal Yes. I always check my answers, or if not possible I clearly indicate so. It works with pandas=1.4.2, perhaps you have an older version? If you find a version where that didn't work, would you kindly indicate so (and try to find the first where it does work correclty)?

Prakash Dahal Over a year ago

In first df, row2 4 5 6 row1 1 2 3, In 2nd df, row1 6 7 8, is this still working? My version is 1.3.5

|

Collectives™ on Stack Overflow

Combine Dataframes in Python with same and different Column Names

2 Answers 2

2 Comments

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related