Add a column to pandas dataframe with computation between rows value and condition

Question

I need to add a new column ("delta") to a dataframe by computing the difference between each row value in column "value" and the value in the same column when num = 5, with the same color and for each group. The result should be as follow :

group   color   num value   delta

Group1  red     1   0.1     -0.4    (0.1 - 0.5 (0.5 is the value in Group1, color=red and num=5))
Group1  green   1   0.2     -0.4    (0.2 - 0.6 (0.6 is the value in Group1, color=green and num=5))
Group1  blue    1   0.3     -0.4    (0.3 - 0.7)
Group1  yellow  1   0.6     0.1     (0.6 - 0.5)
Group1  red     5   0.5     0
Group1  green   5   0.6     0
Group1  blue    5   0.7     0
Group1  yellow  5   0.5     0
Group1  red     7   0.8     0.3
Group1  green   7   0.9     0.3
Group1  blue    7   0.7     0
Group1  yellow  7   0.6     0.1

Group2  red     1   0.1     etc.

I tried to use pivot_table, I suppose it's a start, but I really can't see how to do this conditional computation.

Do you have any idea how this could be done ?

Actual code:

import plotly.graph_objs as go
import pandas as pd

d = {
    "group" : ["Group1", "Group1", "Group1", "Group1", "Group1", "Group1", "Group1", "Group1", "Group1", "Group1", "Group1", "Group1", "Group2", "Group2", "Group2", "Group2", "Group2", "Group2", "Group2", "Group2","Group2", "Group2", "Group2", "Group2"],
    "color" : ["red", "green", "blue", "yellow", "red", "green", "blue", "yellow", "red", "green", "blue", "yellow", "red", "green", "blue", "yellow", "red", "green", "blue", "yellow", "red", "green", "blue", "yellow"],
    "num" : [1, 1, 1, 1, 5, 5, 5, 5, 7, 7, 7, 7, 1, 1, 1, 1, 5, 5, 5, 5, 7, 7, 7, 7],
    "value" : [0.1, 0.2, 0.3, 0.6, 0.5, 0.6, 0.7, 0.5, 0.8, 0.9, 0.7, 0.6, 0.1, 0.2, 0.3, 0.6, 0.5, 0.6, 0.7, 0.5, 0.8, 0.9, 0.7, 0.6,]
    }

df = pd.DataFrame(d)

df_pivot = pd.pivot_table(df, values = ["value"], index = ["group", "color", "num"])

df_pivot["delta"] = df_pivot["value"] # what/how should I substract !?

print(df_pivot)

try with df['dealta'] = df['value'] -0.5 ; print(df) no need to pivot that what your expected? — Beny Gj
– Beny Gj, Commented May 5, 2020 at 11:02
it's not a constant it's really a variable value depending on the condition — Yas
– Yas, Commented May 5, 2020 at 11:06

Mayank Porwal · Accepted Answer · 2020-05-05 11:04:28Z

3

Like this:

In [1771]: m = df[df.num.eq(5)]
In [1774]: res = pd.merge(df,m, on=['group', 'color']) 
In [1779]: res['delta'] = res['value_x'] - res['value_y'] 

In [1781]: res = res.drop(['num_y', 'value_y'],1).rename(columns={'num_x': 'num', 'value_x': 'value'})                                                                                                      

In [1782]: res                                                                                                                                                                                              
Out[1782]: 
     group   color  num  value  delta
0   Group1     red    1    0.1   -0.4
1   Group1     red    5    0.5    0.0
2   Group1     red    7    0.8    0.3
3   Group1   green    1    0.2   -0.4
4   Group1   green    5    0.6    0.0
5   Group1   green    7    0.9    0.3
6   Group1    blue    1    0.3   -0.4
7   Group1    blue    5    0.7    0.0
8   Group1    blue    7    0.7    0.0
9   Group1  yellow    1    0.6    0.1
10  Group1  yellow    5    0.5    0.0
11  Group1  yellow    7    0.6    0.1

answered May 5, 2020 at 11:04

Mayank Porwal

34.2k9 gold badges45 silver badges65 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Mayank Porwal Over a year ago

@Yas This solution would be robust. Will take care of all cases properly.

Yas Over a year ago

Your anwer was the first and it's easy to understand. But I don't like the drop and rename at the end ;)

Mayank Porwal Over a year ago

pd.merge appends _x and _y in column names which it finds duplicate across dataframes. Hence I'm dropping and renaming. No way out of this.

Yas Over a year ago

Yes I understood that, I find the answer of Andy L. with where and transform more concise.

Mayank Porwal Over a year ago

Great. It's your question and you have the right to choose the best answer for yourself.

Andy L. · Accepted Answer · 2020-05-05 11:32:55Z

2

Use where and transform

df['delta'] = (df.value - 
               df.where(df.num.eq(5)).groupby([df.group,df.color])
                                     .value.transform('first'))

Out[16]:
     group   color  num  value  delta
0   Group1     red    1    0.1   -0.4
1   Group1   green    1    0.2   -0.4
2   Group1    blue    1    0.3   -0.4
3   Group1  yellow    1    0.6    0.1
4   Group1     red    5    0.5    0.0
5   Group1   green    5    0.6    0.0
6   Group1    blue    5    0.7    0.0
7   Group1  yellow    5    0.5    0.0
8   Group1     red    7    0.8    0.3
9   Group1   green    7    0.9    0.3
10  Group1    blue    7    0.7    0.0
11  Group1  yellow    7    0.6    0.1
12  Group2     red    1    0.1   -0.4
13  Group2   green    1    0.2   -0.4
14  Group2    blue    1    0.3   -0.4
15  Group2  yellow    1    0.6    0.1
16  Group2     red    5    0.5    0.0
17  Group2   green    5    0.6    0.0
18  Group2    blue    5    0.7    0.0
19  Group2  yellow    5    0.5    0.0
20  Group2     red    7    0.8    0.3
21  Group2   green    7    0.9    0.3
22  Group2    blue    7    0.7    0.0
23  Group2  yellow    7    0.6    0.1

answered May 5, 2020 at 11:32

Andy L.

25.3k4 gold badges20 silver badges30 bronze badges

2 Comments

Yas Over a year ago

I really like this one, I'm just trying to understand the part after groupby (value.transform('first')).

Andy L. Over a year ago

@Yas: it picks the first non-NaN of value of each group-color and populate that value to the whole group. The command after minus sign may also change to df.value.where(df.num.eq(5)).groupby([df.group,df.color]).transform('first')

Allen Qin · Accepted Answer · 2020-05-05 11:26:55Z

2

create a (group, color) dict and map it to each row.

d = (
    df.groupby(['group','color'])
    .apply(lambda x: x.loc[x.num.eq(5)].head(1)).value
    .reset_index(2,drop=True)
    .to_dict()
)

df['delta'] = (
    df.apply(lambda x: x.value - d.get((x.group,x.color), x.value), axis=1)
)

or

df['delta'] = (
    df.apply(lambda x: x.value - 
             df.loc[(df.group==x.group) & (df.color==x.color) & (df.num==5)].iloc[0].value, 
             axis=1)
)

edited May 5, 2020 at 11:26

answered May 5, 2020 at 11:08

Allen Qin

20k9 gold badges55 silver badges68 bronze badges

2 Comments

Yas Over a year ago

Oh this one works too. Dunno yet what would be the best to use ...

Allen Qin Over a year ago

My first solution should be more robust which handles cases where there is no other row with the same color and number 5 in the same group.

Collectives™ on Stack Overflow

Add a column to pandas dataframe with computation between rows value and condition

3 Answers 3

5 Comments

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related