I have a group of accounts of different types, with different options, and I am trying to calculate each users' savings for every month in 2016 compared with their average amount used in 2014 and 2015. My DataFrame looks like this:
key amount id month opt type year
0 100 5 1 M E 2014
1 200 5 1 M G 2014
2 300 5 1 R E 2014
3 400 5 1 R G 2014
4 105 5 1 M E 2015
5 205 5 1 M G 2015
6 305 5 1 R G 2015
7 405 5 1 R E 2015
8 90 5 1 M E 2016
9 180 5 1 M G 2016
10 310 5 1 R G 2016
11 350 5 1 R E 2016
Based on the above, I would expect that user '5' has saved 12.5 in month 1 of 2016 for the 'type' 'E' with the option 'M' compared to their average 'amt' of 102.5 in 2015 and 2016.
The complete answers I would expect for the various types in month 1 of 2016 are are as follows:
M|E -12.5
M|G -22.5
R|E -2.5
R|G -42.5
I thought that a groupby() function might work for this, but the formula I've developed is not giving me the correct answers.
df_savings = df.groupby(['id','year','month','type','opt'], group_keys=False).apply(
lambda s: float(s['amount'][s.year < 2016].sum()/float(2)) - float(s['amount'][s.year == 2016].sum()))
Any help would be greatly appreciated. Here is code used for the sample df above:
df = pd.DataFrame({'id':[5,5,5,5,5,5,5,5,5,5,5,5],
'type':['E','G','E','G','E','G','G','E','E','G','G','E'],
'opt':['M','M','R','R','M','M','R','R','M','M','R','R'],
'year':[2014,2014,2014,2014,2015,2015,2015,2015,2016,2016,2016,2016],
'month':[1,1,1,1,1,1,1,1,1,1,1,1],
'amount':[100,200,300,400,105,205,305,405,90,180,310,350]
})