Create multiple new columns based multiple conditions in Pandas

Question

I try to get new columns a and b based on the following dataframe:

      a_x  b_x    a_y  b_y
0   13.67  0.0  13.67  0.0
1   13.42  0.0  13.42  0.0
2   13.52  1.0  13.17  1.0
3   13.61  1.0  13.11  1.0
4   12.68  1.0  13.06  1.0
5   12.70  1.0  12.93  1.0
6   13.60  1.0    NaN  NaN
7   12.89  1.0    NaN  NaN
8   11.68  1.0    NaN  NaN
9     NaN  NaN   8.87  0.0
10    NaN  NaN   8.77  0.0
11    NaN  NaN   7.97  0.0

If b_x or b_y are 0.0 (at this case they have same values if they both exist), then a_x and b_y share same values, so I take either of them as new columns a and b; if b_x or b_y are 1.0, they are different values, so I calculate means of a_x and a_y as the values of a, take either b_x and b_y as b;

If a_x, b_x or a_y, b_y is not null, so I'll take existing values as a and b.

My expected results will like this:

      a_x  b_x    a_y  b_y       a  b
0   13.67  0.0  13.67  0.0  13.670  0
1   13.42  0.0  13.42  0.0  13.420  0
2   13.52  1.0  13.17  1.0  13.345  1
3   13.61  1.0  13.11  1.0  13.360  1
4   12.68  1.0  13.06  1.0  12.870  1
5   12.70  1.0  12.93  1.0  12.815  1
6   13.60  1.0    NaN  NaN  13.600  1
7   12.89  1.0    NaN  NaN  12.890  1
8   11.68  1.0    NaN  NaN  11.680  1
9     NaN  NaN   8.87  0.0   8.870  0
10    NaN  NaN   8.77  0.0   8.770  0
11    NaN  NaN   7.97  0.0   7.970  0

How can I get an result above? Thank you.

jezrael · Accepted Answer · 2019-12-23 06:24:00Z

Use:

#filter all a and b columns 
b = df.filter(like='b')
a = df.filter(like='a')
#test if at least one 0 or 1 value
m1 = b.eq(0).any(axis=1)
m2 = b.eq(1).any(axis=1)

#get means of a columns
a1 = a.mean(axis=1)
#forward filling mising values and select last column
b1 = b.ffill(axis=1).iloc[:, -1]
a2 = a.ffill(axis=1).iloc[:, -1]

#new Dataframe with 2 conditions
df1 = pd.DataFrame(np.select([m1, m2], [[a2, b1], [a1, b1]]), index=['a','b']).T
#join to original
df = df.join(df1)
print (df)
      a_x  b_x    a_y  b_y       a    b
0   13.67  0.0  13.67  0.0  13.670  0.0
1   13.42  0.0  13.42  0.0  13.420  0.0
2   13.52  1.0  13.17  1.0  13.345  1.0
3   13.61  1.0  13.11  1.0  13.360  1.0
4   12.68  1.0  13.06  1.0  12.870  1.0
5   12.70  1.0  12.93  1.0  12.815  1.0
6   13.60  1.0    NaN  NaN  13.600  1.0
7   12.89  1.0    NaN  NaN  12.890  1.0
8   11.68  1.0    NaN  NaN  11.680  1.0
9     NaN  NaN   8.87  0.0   8.870  0.0
10    NaN  NaN   8.77  0.0   8.770  0.0
11    NaN  NaN   7.97  0.0   7.970  0.0

But I think solution should be simplify, because mean should be used for both conditions (because mean of same values is same like first value):

b = df.filter(like='b')
a = df.filter(like='a')
m1 = b.eq(0).any(axis=1)
m2 = b.eq(1).any(axis=1)

a1 = a.mean(axis=1)
b1 = b.ffill(axis=1).iloc[:, -1]


df['a'] = a1
df['b'] = b1
print (df)
      a_x  b_x    a_y  b_y       a    b
0   13.67  0.0  13.67  0.0  13.670  0.0
1   13.42  0.0  13.42  0.0  13.420  0.0
2   13.52  1.0  13.17  1.0  13.345  1.0
3   13.61  1.0  13.11  1.0  13.360  1.0
4   12.68  1.0  13.06  1.0  12.870  1.0
5   12.70  1.0  12.93  1.0  12.815  1.0
6   13.60  1.0    NaN  NaN  13.600  1.0
7   12.89  1.0    NaN  NaN  12.890  1.0
8   11.68  1.0    NaN  NaN  11.680  1.0
9     NaN  NaN   8.87  0.0   8.870  0.0
10    NaN  NaN   8.77  0.0   8.770  0.0
11    NaN  NaN   7.97  0.0   7.970  0.0

Collectives™ on Stack Overflow

Create multiple new columns based multiple conditions in Pandas

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related