Add columns to pandas Dataframe on different conditions

Question

I have a pandas dataframe.

import pandas as pd
data = pd.DataFrame({
'a': [0,1,0,0,1,1,0,1], 
'b': [0,0,1,0,1,0,1,1], 
'c': [0,0,0,1,0,1,1,1],
'rate': [0,0.1,0.11,0.12,0.24,0.27,0.3,0.4]})

a,b,c are my channels and I am adding another column that shows the sum of row wise totals of these channels by writing:

data['total'] = data.a + data.b + data.c

data

    a   b   c   rate    total
1   1   0   0   0.10    1
2   0   1   0   0.11    1 
3   0   0   1   0.12    1
4   1   1   0   0.24    2
5   1   0   1   0.27    2
6   0   1   1   0.30    2
7   1   1   1   0.40    3

I would like to work on data where total = 1 and total = 2

reduced = data[(data.a == 1) & (data.total == 2)]
print(reduced)

     a  b  c  rate  total
  4  1  1  0  0.24      2
  5  1  0  1  0.27      2

I would like to add columns to this reduced dataframe that looks like this:

      a  b  c  rate  total  prob_a  prob_b  prob_c
   4  1  1  0  0.24      2     0.1    0.11    0
   5  1  0  1  0.27      2     0.1     0      0.12

In the first row of the reduced data-frame, prob_c is 0 since C is absent ( ABC => 110). In the second row of the reduced data-frame, prob_b is 0 since B is absent (ABC => 101)

where,

# Channel a alone occurs (ABC => 100)
prob_a = data['rate'][(data.a == 1) & (data.total == 1)]

# Channel b alone occurs (ABC => 010)
prob_b = data['rate'][(data.b == 1) & (data.total == 1)]

# Channel c alone occurs (ABC => 001)
prob_c = data['rate'][(data.c == 1) & (data.total == 1)]

I tried this:

reduced['prob_a'] = data['rate'][(data.a == 1) & (data.total == 1)]
reduced['prob_b'] = data['rate'][(data.b == 1) & (data.total == 1)]
reduced['prob_c'] = data['rate'][(data.c == 1) & (data.total == 1)]
print(reduced)

which results in this output:

   a  b  c  rate  total  prob_a  prob_b  prob_c
4  1  1  0  0.24      2     NaN     NaN     NaN
5  1  0  1  0.27      2     NaN     NaN     NaN

BENY · Accepted Answer · 2018-02-16 17:41:15Z

3

IIUC

reduced['prob_a'] = data.loc[(data.a == 1) & (data.total == 1),'rate'].values[0]
reduced['prob_b'] = data.loc[(data.b == 1) & (data.total == 1),'rate'].values[0]
reduced['prob_c'] = data.loc[(data.c == 1) & (data.total == 1),'rate'].values[0]

reduced[['prob_a','prob_b','prob_c']]=reduced[['prob_a','prob_b','prob_c']].mul(reduced[['a','b','c']].eq(1).values)
reduced
Out[698]: 
   a  b  c  rate  total  prob_a  prob_b  prob_c
4  1  1  0  0.24      2     0.1    0.11    0.00
5  1  0  1  0.27      2     0.1    0.00    0.12

answered Feb 16, 2018 at 17:41

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

piRSquared · Accepted Answer · 2018-02-16 19:54:18Z

2

Break it into two steps

One
Calculate Probabilities

probs = data.query('a + b + c == 1').pipe(
    lambda d: d.drop('rate', 1).T.dot(d.rate)
)

probs

a    0.10
b    0.11
c    0.12
dtype: float64

How it works
Get just the total equals to one rows:

data.query('a + b + c == 1')

   a  b  c  rate
1  1  0  0  0.10
2  0  1  0  0.11
3  0  0  1  0.12

pipe allows us to pipeline the results into function. From the result above, I want to dot product the columns a, b, c with the column rate. The lambda via the pipe allows me to do that.

More Generalized Approach

df = data.copy()
rate = df.pop('rate')
mask = df.sum(1) == 1
probs = df[mask].T.dot(rate[mask])

probs

Two
The Rest

data.join(
    data.drop('rate', 1).mul(probs).add_prefix('prob_')
).query('a + b + c == 2 and a == 1')

   a  b  c  rate  prob_a  prob_b  prob_c
4  1  1  0  0.24     0.1    0.11    0.00
5  1  0  1  0.27     0.1    0.00    0.12

edited Feb 16, 2018 at 19:54

answered Feb 16, 2018 at 18:22

piRSquared

296k68 gold badges509 silver badges654 bronze badges

3 Comments

Regressor Over a year ago

hey, can u please explain the first step - especially the lambda expression.

Regressor Over a year ago

The example in the question talks about 3 channels at a time. But in general, if I have "N" channels (a,b,c,d, ... z) , how do i write a function that gives me the probabilities for all possible combinations of pairs. For example, if I have 5 channels - A,B,C,D,E. I need 5c2 = 10 rows in my output where each row shows presence of only 2 channels at a time.

piRSquared Over a year ago

I added a more general approach for you

Collectives™ on Stack Overflow

Add columns to pandas Dataframe on different conditions

2 Answers 2

Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related