0

I have two dataframes of different size:

df1 = pd.DataFrame({'A':[1,2,None,4,None,6,7,8,None,10], 'B':[11,12,13,14,15,16,17,18,19,20]})
df1

      A   B
0   1.0  11
1   2.0  12
2   NaN  13
3   4.0  14
4   NaN  15
5   6.0  16
6   7.0  17
7   8.0  18
8   NaN  19
9  10.0  20

df2 = pd.DataFrame({'A':[2,3,4,5,6,8], 'B':[12,13,14,15,16,18]})
df2['A'] = df2['A'].astype(float)
df2

     A   B
0  2.0  12
1  3.0  13
2  4.0  14
3  5.0  15
4  6.0  16
5  8.0  18

I need to fill missing values (and only them) in column A of the first dataframe with values from the second dataframe with common key in the column B. It is equivalent to a SQL query:

UPDATE df1 JOIN df2
  ON df1.B = df2.B
  SET df1.A = df2.A WHERE df1.A IS NULL;

I tried to use answers to similar questions from this site, but it does not work as I need:

df1.fillna(df2)

      A   B
0   1.0  11
1   2.0  12
2   4.0  13
3   4.0  14
4   6.0  15
5   6.0  16
6   7.0  17
7   8.0  18
8   NaN  19
9  10.0  20

df1.combine_first(df2)

      A   B
0   1.0  11
1   2.0  12
2   4.0  13
3   4.0  14
4   6.0  15
5   6.0  16
6   7.0  17
7   8.0  18
8   NaN  19
9  10.0  20

Intended output is:

      A   B
0   1.0  11
1   2.0  12
2   3.0  13
3   4.0  14
4   5.0  15
5   6.0  16
6   7.0  17
7   8.0  18
8   NaN  19
9  10.0  20

How do I get this result?

1 Answer 1

2

You were right about using combine_first(), except that both dataframes must share the same index, and the index must be the column B:

df1.set_index('B').combine_first(df2.set_index('B')).reset_index()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.