1

I need to add a new column ("delta") to a dataframe by computing the difference between each row value in column "value" and the value in the same column when num = 5, with the same color and for each group. The result should be as follow :

group   color   num value   delta

Group1  red     1   0.1     -0.4    (0.1 - 0.5 (0.5 is the value in Group1, color=red and num=5))
Group1  green   1   0.2     -0.4    (0.2 - 0.6 (0.6 is the value in Group1, color=green and num=5))
Group1  blue    1   0.3     -0.4    (0.3 - 0.7)
Group1  yellow  1   0.6     0.1     (0.6 - 0.5)
Group1  red     5   0.5     0
Group1  green   5   0.6     0
Group1  blue    5   0.7     0
Group1  yellow  5   0.5     0
Group1  red     7   0.8     0.3
Group1  green   7   0.9     0.3
Group1  blue    7   0.7     0
Group1  yellow  7   0.6     0.1

Group2  red     1   0.1     etc.

I tried to use pivot_table, I suppose it's a start, but I really can't see how to do this conditional computation.

Do you have any idea how this could be done ?

Actual code:

import plotly.graph_objs as go
import pandas as pd

d = {
    "group" : ["Group1", "Group1", "Group1", "Group1", "Group1", "Group1", "Group1", "Group1", "Group1", "Group1", "Group1", "Group1", "Group2", "Group2", "Group2", "Group2", "Group2", "Group2", "Group2", "Group2","Group2", "Group2", "Group2", "Group2"],
    "color" : ["red", "green", "blue", "yellow", "red", "green", "blue", "yellow", "red", "green", "blue", "yellow", "red", "green", "blue", "yellow", "red", "green", "blue", "yellow", "red", "green", "blue", "yellow"],
    "num" : [1, 1, 1, 1, 5, 5, 5, 5, 7, 7, 7, 7, 1, 1, 1, 1, 5, 5, 5, 5, 7, 7, 7, 7],
    "value" : [0.1, 0.2, 0.3, 0.6, 0.5, 0.6, 0.7, 0.5, 0.8, 0.9, 0.7, 0.6, 0.1, 0.2, 0.3, 0.6, 0.5, 0.6, 0.7, 0.5, 0.8, 0.9, 0.7, 0.6,]
    }

df = pd.DataFrame(d)

df_pivot = pd.pivot_table(df, values = ["value"], index = ["group", "color", "num"])

df_pivot["delta"] = df_pivot["value"] # what/how should I substract !?

print(df_pivot)
4
  • can you please add output desired Commented May 5, 2020 at 10:57
  • I did in the first part of the question Commented May 5, 2020 at 10:59
  • try with df['dealta'] = df['value'] -0.5 ; print(df) no need to pivot that what your expected? Commented May 5, 2020 at 11:02
  • it's not a constant it's really a variable value depending on the condition Commented May 5, 2020 at 11:06

3 Answers 3

3

Like this:

In [1771]: m = df[df.num.eq(5)]
In [1774]: res = pd.merge(df,m, on=['group', 'color']) 
In [1779]: res['delta'] = res['value_x'] - res['value_y'] 

In [1781]: res = res.drop(['num_y', 'value_y'],1).rename(columns={'num_x': 'num', 'value_x': 'value'})                                                                                                      

In [1782]: res                                                                                                                                                                                              
Out[1782]: 
     group   color  num  value  delta
0   Group1     red    1    0.1   -0.4
1   Group1     red    5    0.5    0.0
2   Group1     red    7    0.8    0.3
3   Group1   green    1    0.2   -0.4
4   Group1   green    5    0.6    0.0
5   Group1   green    7    0.9    0.3
6   Group1    blue    1    0.3   -0.4
7   Group1    blue    5    0.7    0.0
8   Group1    blue    7    0.7    0.0
9   Group1  yellow    1    0.6    0.1
10  Group1  yellow    5    0.5    0.0
11  Group1  yellow    7    0.6    0.1
Sign up to request clarification or add additional context in comments.

5 Comments

@Yas This solution would be robust. Will take care of all cases properly.
Your anwer was the first and it's easy to understand. But I don't like the drop and rename at the end ;)
pd.merge appends _x and _y in column names which it finds duplicate across dataframes. Hence I'm dropping and renaming. No way out of this.
Yes I understood that, I find the answer of Andy L. with where and transform more concise.
Great. It's your question and you have the right to choose the best answer for yourself.
2

Use where and transform

df['delta'] = (df.value - 
               df.where(df.num.eq(5)).groupby([df.group,df.color])
                                     .value.transform('first'))

Out[16]:
     group   color  num  value  delta
0   Group1     red    1    0.1   -0.4
1   Group1   green    1    0.2   -0.4
2   Group1    blue    1    0.3   -0.4
3   Group1  yellow    1    0.6    0.1
4   Group1     red    5    0.5    0.0
5   Group1   green    5    0.6    0.0
6   Group1    blue    5    0.7    0.0
7   Group1  yellow    5    0.5    0.0
8   Group1     red    7    0.8    0.3
9   Group1   green    7    0.9    0.3
10  Group1    blue    7    0.7    0.0
11  Group1  yellow    7    0.6    0.1
12  Group2     red    1    0.1   -0.4
13  Group2   green    1    0.2   -0.4
14  Group2    blue    1    0.3   -0.4
15  Group2  yellow    1    0.6    0.1
16  Group2     red    5    0.5    0.0
17  Group2   green    5    0.6    0.0
18  Group2    blue    5    0.7    0.0
19  Group2  yellow    5    0.5    0.0
20  Group2     red    7    0.8    0.3
21  Group2   green    7    0.9    0.3
22  Group2    blue    7    0.7    0.0
23  Group2  yellow    7    0.6    0.1

2 Comments

I really like this one, I'm just trying to understand the part after groupby (value.transform('first')).
@Yas: it picks the first non-NaN of value of each group-color and populate that value to the whole group. The command after minus sign may also change to df.value.where(df.num.eq(5)).groupby([df.group,df.color]).transform('first')
2

create a (group, color) dict and map it to each row.

d = (
    df.groupby(['group','color'])
    .apply(lambda x: x.loc[x.num.eq(5)].head(1)).value
    .reset_index(2,drop=True)
    .to_dict()
)

df['delta'] = (
    df.apply(lambda x: x.value - d.get((x.group,x.color), x.value), axis=1)
)

or

df['delta'] = (
    df.apply(lambda x: x.value - 
             df.loc[(df.group==x.group) & (df.color==x.color) & (df.num==5)].iloc[0].value, 
             axis=1)
)

2 Comments

Oh this one works too. Dunno yet what would be the best to use ...
My first solution should be more robust which handles cases where there is no other row with the same color and number 5 in the same group.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.