0

I have a pandas dataframe as below:

import pandas as pd
df = pd.DataFrame({'ORDER':["A", "A", "A", "B", "B","B"], 'GROUP': ["A_2018_1B1", "A_2018_1B1", "A_2018_1M1", "B_2018_I000_1C1", "B_2018_I000_1B1", "B_2018_I000_1C1H"], 'VAL':[1,3,8,5,8,10]})
df

    ORDER   GROUP            VAL
0    A      A_2018_1B1         1
1    A      A_2018_1B1H        3
2    A      A_2018_1M1         8
3    B      B_2018_I000_1C1    5
4    B      B_2018_I000_1B1    8
5    B      B_2018_I000_1C1H   10

I want to create a column "CAL" as sum of 'VAL' where GROUP name is same for all the rows expect H character in the end. So, for example, 'VAL' column for 1st two rows will be added because the only difference between the 'GROUP' is 2nd row has H in the last. Row 3 will remain as it is, Row 4 and 6 will get added and Row 5 will remain same.

My expected output

    ORDER   GROUP            VAL    CAL
0    A      A_2018_1B1         1    4
1    A      A_2018_1B1H        3    4
2    A      A_2018_1M1         8    8
3    B      B_2018_I000_1C1    5    15
4    B      B_2018_I000_1B1    8    8
5    B      B_2018_I000_1C1H   10   15

1 Answer 1

2

Try with replace then transform

df.groupby(df.GROUP.str.replace('H','')).VAL.transform('sum')
0     4
1     4
2     8
3    15
4     8
5    15
Name: VAL, dtype: int64

df['CAL'] = df.groupby(df.GROUP.str.replace('H','')).VAL.transform('sum')
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.