3

I have two columns in a dataframe and I need to create a new one based on them. For example:

df = pd.DataFrame(data={'a':[1.0,1.0,2.0], 'b':[3.0,3.0,3.0]})

df.iloc[1,0]=np.nan

 a    b
0  1.0  3.0
1  NaN  3.0
2  2.0  3.0

I need to add a column c which takes value from a when it is not null and otherwise from b. like:

a    b    c
0  1.0  3.0  1.0
1  NaN  3.0  3.0
2  2.0  3.0  2.0

Here is what I have tried:

def dist(df):
    if df['a']:
        return df.a
    else:
        return df.b
df['c']=df.apply(dist,axis=1)

but the result is not what I expected. Can anyone suggest what I should do? Thx!

1 Answer 1

1
>>> d['c'] = df.a.where(~np.isnan(df.a), df.b)
>>> df
    a  b  c
0   1  3  1
1 NaN  3  3
2   2  3  2

It is tempting to write the more compact:

df['c'] = df.a.where(df.a, df.b)

but this won't do the right thing for df.a[k] == 0 (which is also interpreted as False).

Instead of isnan, you can use the property of NaN in that it is the only value not equal to itself, so the following also works:

df['c'] = df.a.where(df.a==df.a, df.b)
Sign up to request clarification or add additional context in comments.

1 Comment

I tried the where phrase and it worked nicely with my data. thx!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.