Python DataFrame sum values in columnA based on conditions in columnsN

Question

I have a group of accounts of different types, with different options, and I am trying to calculate each users' savings for every month in 2016 compared with their average amount used in 2014 and 2015. My DataFrame looks like this:

key amount  id  month   opt type    year
0   100     5   1       M   E       2014
1   200     5   1       M   G       2014
2   300     5   1       R   E       2014
3   400     5   1       R   G       2014
4   105     5   1       M   E       2015
5   205     5   1       M   G       2015
6   305     5   1       R   G       2015
7   405     5   1       R   E       2015
8   90      5   1       M   E       2016
9   180     5   1       M   G       2016
10  310     5   1       R   G       2016
11  350     5   1       R   E       2016

Based on the above, I would expect that user '5' has saved 12.5 in month 1 of 2016 for the 'type' 'E' with the option 'M' compared to their average 'amt' of 102.5 in 2015 and 2016.

The complete answers I would expect for the various types in month 1 of 2016 are are as follows:

M|E -12.5
M|G -22.5
R|E  -2.5
R|G -42.5

I thought that a groupby() function might work for this, but the formula I've developed is not giving me the correct answers.

df_savings = df.groupby(['id','year','month','type','opt'], group_keys=False).apply(
         lambda s: float(s['amount'][s.year < 2016].sum()/float(2)) - float(s['amount'][s.year == 2016].sum()))

Any help would be greatly appreciated. Here is code used for the sample df above:

df = pd.DataFrame({'id':[5,5,5,5,5,5,5,5,5,5,5,5],
               'type':['E','G','E','G','E','G','G','E','E','G','G','E'],
               'opt':['M','M','R','R','M','M','R','R','M','M','R','R'],
            'year':[2014,2014,2014,2014,2015,2015,2015,2015,2016,2016,2016,2016],
            'month':[1,1,1,1,1,1,1,1,1,1,1,1],
            'amount':[100,200,300,400,105,205,305,405,90,180,310,350]
            })

al0 · Accepted Answer · 2017-02-03 02:12:58Z

1

You could split it up into two pieces, 2016 and 2014-15, then groupby which results in two similar dataframes you can subtract:

df[df.year == 2016].groupby(['id', 'month', 'opt', 'type'])['amount'].sum() - df[df.year < 2016].groupby(['id', 'month', 'opt', 'type'])['amount'].mean()

edited Feb 3, 2017 at 2:12

answered Feb 3, 2017 at 2:07

al0

3081 gold badge2 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python DataFrame sum values in columnA based on conditions in columnsN

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related