Pandas dataframe conditional column based on multiple conditions only working on first condition?

Question

I have a data frame that looks something like this: (there are about 100 more columns irrelevant to my conditional column calculation)

col1     col2     col3
a        NaN      NaN
b        NaN      NaN
NaN      a        NaN
NaN      b        NaN
NaN      NaN      a
NaN      NaN      b

I need to add a column to put those values together so that it looks like this:

col1     col2     col3     col4
a        NaN      NaN      a
b        NaN      NaN      b
NaN      a        NaN      a
NaN      b        NaN      b
NaN      NaN      a        a
NaN      NaN      b        b

I'm trying to use something like this (which has worked for other conditions, such as searching for specific strings):

df['col4'] = [x if (~pd.isnull(x)) else y if (~pd.isnull(y)) else z if (~pd.isnull(z)) else '' for x,y,z in zip(df['col1'], df['col2'], df['col3])

However, this only performs the first test condition and sets the rest as NaN, even if I set the else condition to set the rest as empty strings. It looks like this:

col1     col2     col3     col4
a        NaN      NaN      a
b        NaN      NaN      b
NaN      a        NaN      NaN
NaN      b        NaN      NaN
NaN      NaN      a        NaN
NaN      NaN      b        NaN

Could anyone help explain why this isn't working (and what these kinds of "functions" are called?)

Edit: to clarify, there are other columns as well, but I'm not concerned about their values in the calculation for 'col4'

BENY · Accepted Answer · 2019-11-22 16:29:35Z

4

Let us try bfill

df['col4']=df.bfill(1).iloc[:,0]
df
Out[107]: 
  col1 col2 col3 col4
0    a  NaN  NaN    a
1    b  NaN  NaN    b
2  NaN    a  NaN    a
3  NaN    b  NaN    b
4  NaN  NaN    a    a
5  NaN  NaN    b    b

answered Nov 22, 2019 at 16:29

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

ewong18 Over a year ago

This works as well, but what does the bfill(1) 1 argument do, and why is the iloc needed?

piRSquared · Accepted Answer · 2019-11-22 16:34:30Z

3

`stack` and `groupby` with `first`

df.assign(col4=df.stack().groupby(level=0).first())

  col1 col2 col3 col4
0    a  NaN  NaN    a
1    b  NaN  NaN    b
2  NaN    a  NaN    a
3  NaN    b  NaN    b
4  NaN  NaN    a    a
5  NaN  NaN    b    b

`argmin` and `lookup`

a = df.isna().to_numpy()
j = a.argmin(axis=1)
df.assign(col4=df.lookup(df.index, df.columns[j]))

  col1 col2 col3 col4
0    a  NaN  NaN    a
1    b  NaN  NaN    b
2  NaN    a  NaN    a
3  NaN    b  NaN    b
4  NaN  NaN    a    a
5  NaN  NaN    b    b

`numpy.select`

conditions = df.notna().to_numpy().T
selections = [c.to_numpy() for _, c in df.iteritems()]
df.assign(col4=np.select(conditions, selections))

  col1 col2 col3 col4
0    a  NaN  NaN    a
1    b  NaN  NaN    b
2  NaN    a  NaN    a
3  NaN    b  NaN    b
4  NaN  NaN    a    a
5  NaN  NaN    b    b

edited Nov 22, 2019 at 16:34

answered Nov 22, 2019 at 16:27

piRSquared

296k68 gold badges509 silver badges654 bronze badges

4 Comments

ewong18 Over a year ago

How would these work if I have about 100 other columns and i want to choose the columns that are relevant, without having to find their indices? The last one seems way too complicated...

piRSquared Over a year ago

create a new dataframe df_new = df[columns_I_care_about]. With the first concept df.assign(col4=df[columns_I_care_about].stack().groupby(level=0).first())

ewong18 Over a year ago

thanks! that worked! I want to understand more of how this works. Would you mind explaining why the stack() portion is used? the groupby i assume is to horizontally collapse the columns to non-nulls based on the relevant columns, and first chooses the first instance from left to right in case there are multiple columns with non-nulls? What if, say if all three columns had non-nulls and i wanted col2 to take precedence over the others?

piRSquared Over a year ago

The elimination of the nulls via stack is coincidental. first after a groupby would've picked the first non-null value anyway. I used stack because it was syntactically convenient to do a groupby(level=0) afterwards. Otherwise I'd have to do something obnoxious like df.assign(col4=df.groupby(lambda x: 0, axis=1).first())

Collectives™ on Stack Overflow

Pandas dataframe conditional column based on multiple conditions only working on first condition?

2 Answers 2

1 Comment

`stack` and `groupby` with `first`

`argmin` and `lookup`

`numpy.select`

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

stack and groupby with first

argmin and lookup

numpy.select

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related

`stack` and `groupby` with `first`

`argmin` and `lookup`

`numpy.select`