Replace zeros in one dataframe with values from another dataframe

Question

I have two dataframes df1 and df2: df1 is shown here:

df2 is shown here:

I want to replace all the zeros in df2 with their corresponding entries in df1. In more technical words, if the element at a certain index in df2 is zero, then I would want this element to be replaced by the corresponding entry in df1.

Hence, I want df2 to look like:

I tried using the replace method but it is not working. Please help :) Thanks in advance.

unutbu · Accepted Answer · 2017-08-16 16:00:28Z

9

You could use where:

In [19]: df2.where(df2 != 0, df1)
Out[19]: 
   age
0   42
1   52
2    1
3   24
4   73

Above, df2 != 0 is a boolean DataFrame.

In [16]: df2 != 0
Out[16]: 
     age
0  False
1  False
2   True
3  False
4  False

df2.where(df2 != 0, df1) returns a new DataFrame. Where df2 != 0 is True, the corresponding value of df2 is used. Where it is False, the corresponding value of df1 is used.

Another alternative is to make an assignment with df.loc:

df2.loc[df2['age'] == 0, 'age'] = df1['age']

df.loc[mask, col] selects rows of df where the boolean Series, mask is True, and where the column label is col.

In [17]: df2.loc[df2['age'] == 0, 'age']
Out[17]: 
0    0
1    0
3    0
4    0
Name: age, dtype: int64

When used in an assignment, such as df2.loc[df2['age'] == 0, 'age'] = df1['age'], Pandas performs automatic index label alignment. (Notice the index labels above are 0,1,3,4 -- with 2 being skipped). So the values in df2.loc[df2['age'] == 0, 'age'] are replaced by the corresponding values from d1['age']. Even though d1['age'] is a Series with index labels 0,1,2,3, and 4, the 2 is ignored because there is no corresponding index label on the left-hand side.

In other words,

df2.loc[df2['age'] == 0, 'age'] = df1.loc[df2['age'] == 0, 'age']

would work as well, but the added restriction on the right-hand side is unnecessary.

edited Aug 16, 2017 at 16:00

answered Aug 15, 2017 at 21:15

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

ZeusofCode Over a year ago

Thank you. However, when I try df2.where(df2['age'] != 0, df1) I get AttributeError: 'float' object has no attribute 'all'

unutbu Over a year ago

I think you are experiencing this bug -- you can fix it by upgrading your version of pandas.

ZeusofCode Over a year ago

The pandas version cannot be changed because its installed on a server and I can only use that one :( The pandas version I have is 0.15.1'

unutbu Over a year ago

Then I believe you could use df2.loc[df2['age'] == 0, 'age'] = df1['age'] instead.

ZeusofCode Over a year ago

Thanks a lot works great ! :) Could you be kind enough to explain how this code works as well ? and even how the df2.where(df2 != 0, df1) works

MaxU - stand with Ukraine · Accepted Answer · 2017-08-15 21:25:06Z

3

In [30]: df2.mask(df2==0).combine_first(df1)
Out[30]:
    age
0  42.0
1  52.0
2   1.0
3  24.0
4  73.0

or "negating" beautiful @unutbu's solution:

In [46]: df2.mask(df2==0, df1)
Out[46]:
   age
0   42
1   52
2    1
3   24
4   73

edited Aug 15, 2017 at 21:25

answered Aug 15, 2017 at 21:08

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

1 Comment

MaxU - stand with Ukraine Over a year ago

@Vaishali, yes, thank you! It's a "negation" of beautiful unutbu's solution :)

BENY · Accepted Answer · 2017-08-15 21:19:17Z

1

Or try mul

df1.mul(np.where(df2==1,0,1)).replace({0:1})

edited Aug 15, 2017 at 21:19

answered Aug 15, 2017 at 21:12

BENY

324k22 gold badges176 silver badges250 bronze badges

Collectives™ on Stack Overflow

Replace zeros in one dataframe with values from another dataframe

3 Answers 3

5 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related