3

I have a group of accounts of different types, with different options, and I am trying to calculate each users' savings for every month in 2016 compared with their average amount used in 2014 and 2015. My DataFrame looks like this:

key amount  id  month   opt type    year
0   100     5   1       M   E       2014
1   200     5   1       M   G       2014
2   300     5   1       R   E       2014
3   400     5   1       R   G       2014
4   105     5   1       M   E       2015
5   205     5   1       M   G       2015
6   305     5   1       R   G       2015
7   405     5   1       R   E       2015
8   90      5   1       M   E       2016
9   180     5   1       M   G       2016
10  310     5   1       R   G       2016
11  350     5   1       R   E       2016

Based on the above, I would expect that user '5' has saved 12.5 in month 1 of 2016 for the 'type' 'E' with the option 'M' compared to their average 'amt' of 102.5 in 2015 and 2016.

The complete answers I would expect for the various types in month 1 of 2016 are are as follows:

M|E -12.5
M|G -22.5
R|E  -2.5
R|G -42.5

I thought that a groupby() function might work for this, but the formula I've developed is not giving me the correct answers.

df_savings = df.groupby(['id','year','month','type','opt'], group_keys=False).apply(
         lambda s: float(s['amount'][s.year < 2016].sum()/float(2)) - float(s['amount'][s.year == 2016].sum()))

Any help would be greatly appreciated. Here is code used for the sample df above:

df = pd.DataFrame({'id':[5,5,5,5,5,5,5,5,5,5,5,5],
               'type':['E','G','E','G','E','G','G','E','E','G','G','E'],
               'opt':['M','M','R','R','M','M','R','R','M','M','R','R'],
            'year':[2014,2014,2014,2014,2015,2015,2015,2015,2016,2016,2016,2016],
            'month':[1,1,1,1,1,1,1,1,1,1,1,1],
            'amount':[100,200,300,400,105,205,305,405,90,180,310,350]
            })

1 Answer 1

1

You could split it up into two pieces, 2016 and 2014-15, then groupby which results in two similar dataframes you can subtract:

df[df.year == 2016].groupby(['id', 'month', 'opt', 'type'])['amount'].sum() - df[df.year < 2016].groupby(['id', 'month', 'opt', 'type'])['amount'].mean()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.