6

I have two dataframes df1 and df2 . They both contain time-series data, so it is possible some of the dates in df1 and df2 intersect with each other and the rest don't. My requirement is an operation on the two dataframes that replaces the values in df1 with the values in df2 for the same dates, leaves alone values for indexes in df1 not present in df2 and adds the values for indexes present in df2 and not in df1. Consider the following example:

df1:
    A   B   C   D
0   A0  BO  C0  D0
1   A1  B1  C1  D1
2   A2  B2  C2  D2
3   A3  B3  C3  D3

df2:
    A   B   C   E
1   A4  B4  C4  E4
2   A5  B5  C5  E5
3   A6  B6  C6  E6
4   A7  B7  C7  E7

result df:
    A   B   C   D   E
0   A0  BO  C0  D0  Nan
1   A4  B4  C4  D4  E4
2   A5  B5  C5  D5  E5
3   A6  B6  C6  D6  E6
4   A7  B7  C7  D7  E7

I tried to develop the logic with the first step concatenating the two dfs but that leads to rows with duplicate indexes and am not sure how to handle that. How can this be achieved? Any suggestions would help

Edit: A simpler case would be when the column names are same in the two dataframes. So consider df2 has column D instead of E with values D4,D5,D6,D7.

A concatenation yields the following result:

concat(df1,df2,axis=1)
    A    B    C    D    A    B    C    D
0   A0   B0   C0   D0  NaN  NaN  NaN  NaN  
1   A1   B1   C1   D1   A4   B4   C4   D4
2   A2   B2   C2   D2   A5   B5   C5   D5
3   A3   B3   C3   D3   A6   B6   C6   D6
4  NaN  NaN  NaN  NaN   A7   B7   C7   D7

Now this introduces duplicate columns. A conventional solution would be to loop through each column but I am looking for a more elegant solution. Any ideas would be appreciated.

2
  • The problem with this setup is that the DataFrames won't align on columns D & E. Commented May 24, 2015 at 1:11
  • for simplicity sake we can ignore the column E and assume they have the same columns , how would this operation then be achieved, given df2 had column D instead of E with values D4-D7 Commented May 24, 2015 at 1:16

2 Answers 2

8

update will align on the indices of both DataFrames:

df1.update(df2)

df1:
    A   B   C   D
0   A0  BO  C0  D0
1   A1  B1  C1  D1
2   A2  B2  C2  D2
3   A3  B3  C3  D3

df2:
    A   B   C   D
1   A4  B4  C4  D4
2   A5  B5  C5  D5
3   A6  B6  C6  D6
4   A7  B7  C7  D7

>>> df1.update(df2)
    A   B   C   D
0  A0  BO  C0  D0
1  A4  B4  C4  D4
2  A5  B5  C5  D5
3  A6  B6  C6  D6

You then need to add the values in df2 not present in df1:

>>> df1.append(df2.loc[[i for i in df2.index if i not in df1.index], :])
Out[46]: 
    A   B   C   D
0  A0  BO  C0  D0
1  A4  B4  C4  D4
2  A5  B5  C5  D5
3  A6  B6  C6  D6
4  A7  B7  C7  D7
Sign up to request clarification or add additional context in comments.

4 Comments

that solves one part of the problem definitely alexander, which is replace values in df1 with values in df2 for the same indexes. However we do not have a track of indexes in df2 in the above case -- 4 which is not present in df1. The final result should contain that as well
Amended response to include that as well.
yes that works perfectly, thanks. I figured there is no other way but to loop the second dataframe to achieve the desired output. Your help has been greatly appreciated thanks!
You may not have noticed, but you now have enough reputation to upvote the response (-;
2

I just saw this question and realized it is almost identical to one that I just asked today and that @Alexander (the poster of the answer above) answered very nicely:

pd.concat([df1[~df1.index.isin(df2.index)], df2])

See pandas DataFrame concat / update ("upsert")? for the discussion.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.