Python create a column based on the values of each row of another column

Question

I have a pandas dataframe as below:

import pandas as pd
df = pd.DataFrame({'ORDER':["A", "A", "A", "B", "B","B"], 'GROUP': ["A_2018_1B1", "A_2018_1B1", "A_2018_1M1", "B_2018_I000_1C1", "B_2018_I000_1B1", "B_2018_I000_1C1H"], 'VAL':[1,3,8,5,8,10]})
df

    ORDER   GROUP            VAL
0    A      A_2018_1B1         1
1    A      A_2018_1B1H        3
2    A      A_2018_1M1         8
3    B      B_2018_I000_1C1    5
4    B      B_2018_I000_1B1    8
5    B      B_2018_I000_1C1H   10

I want to create a column "CAL" as sum of 'VAL' where GROUP name is same for all the rows expect H character in the end. So, for example, 'VAL' column for 1st two rows will be added because the only difference between the 'GROUP' is 2nd row has H in the last. Row 3 will remain as it is, Row 4 and 6 will get added and Row 5 will remain same.

My expected output

    ORDER   GROUP            VAL    CAL
0    A      A_2018_1B1         1    4
1    A      A_2018_1B1H        3    4
2    A      A_2018_1M1         8    8
3    B      B_2018_I000_1C1    5    15
4    B      B_2018_I000_1B1    8    8
5    B      B_2018_I000_1C1H   10   15

BENY · Accepted Answer · 2020-07-09 15:15:24Z

2

Try with replace then transform

df.groupby(df.GROUP.str.replace('H','')).VAL.transform('sum')
0     4
1     4
2     8
3    15
4     8
5    15
Name: VAL, dtype: int64

df['CAL'] = df.groupby(df.GROUP.str.replace('H','')).VAL.transform('sum')

answered Jul 9, 2020 at 15:15

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python create a column based on the values of each row of another column

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related