1

I have two dataframes df1 and df2: df1 is shown here:

   age
0   42
1   52
2   36
3   24
4   73

df2 is shown here:

   age
0    0
1    0
2    1
3    0
4    0

I want to replace all the zeros in df2 with their corresponding entries in df1. In more technical words, if the element at a certain index in df2 is zero, then I would want this element to be replaced by the corresponding entry in df1.

Hence, I want df2 to look like:

   age
0    42
1    52
2    1
3    24
4    73

I tried using the replace method but it is not working. Please help :) Thanks in advance.

3 Answers 3

9

You could use where:

In [19]: df2.where(df2 != 0, df1)
Out[19]: 
   age
0   42
1   52
2    1
3   24
4   73

Above, df2 != 0 is a boolean DataFrame.

In [16]: df2 != 0
Out[16]: 
     age
0  False
1  False
2   True
3  False
4  False

df2.where(df2 != 0, df1) returns a new DataFrame. Where df2 != 0 is True, the corresponding value of df2 is used. Where it is False, the corresponding value of df1 is used.


Another alternative is to make an assignment with df.loc:

df2.loc[df2['age'] == 0, 'age'] = df1['age']

df.loc[mask, col] selects rows of df where the boolean Series, mask is True, and where the column label is col.

In [17]: df2.loc[df2['age'] == 0, 'age']
Out[17]: 
0    0
1    0
3    0
4    0
Name: age, dtype: int64

When used in an assignment, such as df2.loc[df2['age'] == 0, 'age'] = df1['age'], Pandas performs automatic index label alignment. (Notice the index labels above are 0,1,3,4 -- with 2 being skipped). So the values in df2.loc[df2['age'] == 0, 'age'] are replaced by the corresponding values from d1['age']. Even though d1['age'] is a Series with index labels 0,1,2,3, and 4, the 2 is ignored because there is no corresponding index label on the left-hand side.

In other words,

df2.loc[df2['age'] == 0, 'age'] = df1.loc[df2['age'] == 0, 'age']

would work as well, but the added restriction on the right-hand side is unnecessary.

Sign up to request clarification or add additional context in comments.

5 Comments

Thank you. However, when I try df2.where(df2['age'] != 0, df1) I get AttributeError: 'float' object has no attribute 'all'
I think you are experiencing this bug -- you can fix it by upgrading your version of pandas.
The pandas version cannot be changed because its installed on a server and I can only use that one :( The pandas version I have is 0.15.1'
Then I believe you could use df2.loc[df2['age'] == 0, 'age'] = df1['age'] instead.
Thanks a lot works great ! :) Could you be kind enough to explain how this code works as well ? and even how the df2.where(df2 != 0, df1) works
3
In [30]: df2.mask(df2==0).combine_first(df1)
Out[30]:
    age
0  42.0
1  52.0
2   1.0
3  24.0
4  73.0

or "negating" beautiful @unutbu's solution:

In [46]: df2.mask(df2==0, df1)
Out[46]:
   age
0   42
1   52
2    1
3   24
4   73

1 Comment

@Vaishali, yes, thank you! It's a "negation" of beautiful unutbu's solution :)
1

Or try mul

df1.mul(np.where(df2==1,0,1)).replace({0:1})

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.