2

Hello I have 2 dataframes I want to combine

dataframe 1 :

ID A B C
row1 1 2 3
row2 4 5 6

dataframe 2:

ID A B D
row1 6 7 8

and I want them to merge and replace values of the same row to the values on dataframe 2 like this:

ID A B C D
row1 6 7 3 8
row2 4 5 6 null

how do I do this? I tried merging and concatenation but it doesn't seem to work. Thank you

2
  • perhaps this one may help ? stackoverflow.com/a/43735493/3270433 Commented Jul 6, 2022 at 15:35
  • @PrakashDahal this is just a sample, I need to use the code for much larger data that is always changing that need to replace data on rows and columns that already exist and add columns that aren't on the original dataframe Commented Jul 6, 2022 at 15:36

2 Answers 2

4

Another method to merge your 2 dataframes:

>>> pd.concat([df1, df2]).groupby('ID').last().reset_index()
     ID  A  B    C    D
0  row1  6  7  3.0  8.0
1  row2  4  5  6.0  NaN

Solution enhanced by @PierreD:

This assumes ID is not the index, however (if it is, then it is lost). If you reformulate as pd.concat([df1, df2]).groupby('ID').last(), then it works in both cases, and makes ID the index. You can of course then .reset_index() if that's not desired.

Sign up to request clarification or add additional context in comments.

2 Comments

Nice, clean and going from first principles. This assumes ID is not the index, however (if it is, then it is lost). If you reformulate as pd.concat([df1, df2]).groupby('ID').last(), then it works in both cases, and makes ID the index. You can of course then .reset_index() if that's not desired.
Thanks @PierreD for your comment. I updated my solution with your suggestion.
2

Assuming ID is the index in both DataFrames (if not, make it so): There is actually a function combine_first():

out = df2.combine_first(df1)
>>> out
      A  B  C    D
ID                
row1  6  7  3  8.0
row2  4  5  6  NaN

Notes:

  1. why is column D of type float? Because of that NaN.
  2. what if the rows are in different order, e.g. df1 has row2 first and then row1? Not a problem at all and the result is exactly the same as above (with rows sorted). Tested with pandas=1.4.2 and also pandas=1.3.4.

6 Comments

This might not work if the rows are interchanged in 1st df?
@PrakashDahal: yes it works regardless of the respective ordering of rows in either df.
@PierreD have you checked it in notebook? It did not work if the row1 and row2 placed are interchanged in df1
@PrakashDahal Yes. I always check my answers, or if not possible I clearly indicate so. It works with pandas=1.4.2, perhaps you have an older version? If you find a version where that didn't work, would you kindly indicate so (and try to find the first where it does work correclty)?
In first df, row2 4 5 6 row1 1 2 3, In 2nd df, row1 6 7 8, is this still working? My version is 1.3.5
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.