3

I have this dataframe in pandas:

   day customer  amount
0    1    cust1     500
1    2    cust2     100
2    1    cust1      50
3    2    cust1     100
4    2    cust2     250
5    6    cust1      20

For convenience:

df = pd.DataFrame({'day': [1, 2, 1, 2, 2, 6],
                   'customer': ['cust1', 'cust2', 'cust1', 'cust1', 'cust2', 'cust1'],
                   'amount': [500, 100, 50, 100, 250, 20]})

I would like to create a new column 'amount2days' so as to aggragate amounts per customer for the last two days, to get the following dataframe:

   day customer  amount    amount2days   ----------------------------
0    1    cust1     500    500           (no past transactions)
1    2    cust2     100    100           (no past transactions)
2    1    cust1      50    550           (500 + 50 = rows 0,2 
3    2    cust1     100    650           (500 + 50 + 100, rows 0,2,3)
4    2    cust2     250    350           (100 + 250, rows 1,4) 
5    6    cust1      20    20            (notice day is 6, and no day=5 for cust1)

i.e. I would like to perform the following (pseudo) code:

df['amount2days'] = df_of_past_2_days['amount'].sum()

for each row. What is the most convenient way to do so?

The summing I wish to peroform is over the day, but days does not necessarily have to increment in each new row, as shown in the example. Still I want to sum amounts over the past 2 days.

1

2 Answers 2

2

Use groupby with Series.rolling and sum

Notice: Here is necessary add DataFrame.reset_index for avoid wrong alignement of data:

df['amount2days'] = (df.groupby('customer')['amount']
                       .rolling(2, min_periods=0)
                       .sum()
                       .reset_index(level=0, drop=True))
print (df)
   day customer  amount  amount2days
1    1    cust1     500        500.0
2    2    cust1     100        600.0
3    3    cust1     250        350.0

Why not use .to_numpy here? Because if not default index then output should be assigned wrongly - check sample below:

df = pd.DataFrame({'day': {0: 1, 2: 2, 5: 3, 1: 1, 6: 2, 4: 3}, 'customer': {0: 'cust2', 2: 'cust2', 5: 'cust2', 1: 'cust1', 6: 'cust1', 4: 'cust1'}, 'amount': {0: 5000, 2: 1000, 5: 2500, 1: 500, 6: 100, 4: 250}})
print (df)
   day customer  amount
0    1    cust2    5000
2    2    cust2    1000
5    3    cust2    2500
1    1    cust1     500
6    2    cust1     100
4    3    cust1     250

df['amount2days'] = (df.groupby('customer', sort=False).amount
                       .rolling(2, min_periods=0)
                       .sum()
                       .to_numpy())

df['amount2days1'] = (df.groupby('customer')['amount']
                       .rolling(2, min_periods=0)
                       .sum()
                       .reset_index(level=0, drop=True))
print (df)
   day customer  amount  amount2days  amount2days1
0    1    cust2    5000        500.0        5000.0
2    2    cust2    1000        600.0        6000.0
5    3    cust2    2500        350.0        3500.0
1    1    cust1     500       5000.0         500.0
6    2    cust1     100       6000.0         600.0
4    3    cust1     250       3500.0         350.0

EDIT: General solution:

def f(x):
    N = 1
    for i in pd.unique(x['day']):
        y = x[x['day'].between(i - N, i)]
        x.loc[y.index[-1], 'amountNdays'] = y['amount'].sum()
    
    return x

df = df.groupby('customer').apply(f)
df['amountNdays'] = df['amountNdays'].fillna(df['amount'])
print (df)
   day customer  amount  amountNdays
0    1    cust1     500        500.0
1    2    cust2     100        100.0
2    1    cust1      50        550.0
3    2    cust1     100        650.0
4    2    cust2     250        350.0
5    6    cust1      20         20.0
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks. I've updated my question to be more clear. My data does not necessarily increments 'day' in each new row, still I want to sum over 2 days backwards. Does simple rolling will work in that case?
still not working in the case that there is nothing to sum for backwards rows, I edited my example once again (should be just 20, but this method gives 120 there). Thank you.
@jezarel with "there" I mean the 'amount2days' of the last row.
and maybe there is a way to generalize this to amount_x_days easily?
@user112112 - added general solution
1

You can use panda's rolling for moving window operations (depending on the version of pandas, reset_index as in jezrael's answer will be safer):

df['amount2days'] = (df.groupby('customer', sort=False).amount
                       .rolling(2, min_periods=0)
                       .sum()
                       .to_numpy()) 

print(df)
   day customer  amount  amount2days
1    1    cust1     500        500.0
2    2    cust1     100        600.0
3    3    cust1     250        350.0

1 Comment

Comments are not for extended discussion; this conversation has been moved to chat.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.