Given the following df:
datetimeindex store sale category weekday
2018-10-13 09:27:01 gbn01 59.99 sporting 1
2018-10-13 09:27:01 gbn02 19.99 sporting 1
2018-10-13 09:27:02 gbn03 15.99 hygine 1
2018-10-13 09:27:03 gbn05 39.99 camping 1
....
2018-10-16 11:59:01 gbn01 19.99 other 0
2018-10-16 11:59:01 gbn02 49.99 sporting 0
2018-10-16 11:59:02 gbn03 10.00 food 0
2018-10-16 11:59:03 gbn05 89.99 electro 0
2018-10-16 12:30:03 gbn01 52.99
....
2018-10-16 21:05:03 gbn03 25.00 alcohol 0
2018-10-16 22:43:03 gbn01 10.05 health 0
Update
After re-reading the reqs it looks like the mean_sales will calculate for that specific timestamp for that store during that period (08:00 to 18:00 or 12:00 to 13:00). My current thinking is to implement the below pseudo but it would currently only work if it was ordered by datetimeindex,store:
#Lunch_Time_Mean
count=0
Lunch_Sum_Previous=0
for r in df:
if LunchHours & WeekDay:
count++
if count=1:
r.Lunch_Mean=r.sale
Lunch_Sum_Previous = r.sale
elif count > 1:
r.Lunch_Mean = Lunch_Sum_Previous + r.sale / count
Lunch_Sum_Previous += r.sale
else:
r.Lunch_Mean=1
count=0
Lunch_Sum_Previous = 0
Above Logic mapped to a table:
datetimeindex store IsWorkingHour count sales working_hour_sum working_hour_cumsum working_hour_mean_sales
13/10/2018 07:27 gbn01 0 0 39.18 0 0 1
13/10/2018 08:27 gbn01 1 1 31.69 31.69 31.69 1
13/10/2018 09:27 gbn01 1 2 99.19 99.19 130.88 1
13/10/2018 10:27 gbn01 1 3 25.89 25.89 156.77 1
13/10/2018 11:27 gbn01 1 4 19.10 19.10 175.87 1
13/10/2018 12:27 gbn01 1 5 82.51 82.51 258.38 1
13/10/2018 13:27 gbn01 1 6 10.82 10.82 269.2 1
13/10/2018 14:27 gbn01 1 7 10.43 10.43 279.63 1
13/10/2018 15:27 gbn01 1 8 15.83 15.83 295.46 1
13/10/2018 16:27 gbn01 1 9 12.53 12.53 307.99 1
13/10/2018 17:27 gbn01 1 10 10.03 10.03 318.02 1
13/10/2018 18:27 gbn01 0 0 54.14 0 0 1
13/10/2018 19:27 gbn01 0 0 20.04 0 0 1
#Above enteries have weekday_mean_sales of 0 because 13/10/2018 is on a weekend.
16/10/2018 07:27 gbn01 0 0 13.34 0 0 1
16/10/2018 08:27 gbn01 1 1 15.84 15.84 15.84 15.84
16/10/2018 09:27 gbn01 1 2 19.14 19.14 34.98 17.49
16/10/2018 10:27 gbn01 1 3 11.64 11.64 46.62 15.54
16/10/2018 11:27 gbn01 1 4 17.54 17.54 64.16 16.04
16/10/2018 12:27 gbn01 1 5 20.84 20.84 85 17
16/10/2018 13:27 gbn01 1 6 50.05 50.05 135.05 22.51
16/10/2018 14:27 gbn01 1 7 10.05 10.05 145.1 20.73
16/10/2018 15:27 gbn01 1 8 13.35 13.35 158.45 19.81
16/10/2018 16:27 gbn01 1 9 32.55 32.55 191 21.22
16/10/2018 17:27 gbn01 1 10 13.36 13.36 204.36 20.44
16/10/2018 18:27 gbn01 0 0 10.86 0 0 1
16/10/2018 19:27 gbn01 0 0 20.06 0 0 1
Desired Output
I'm attempting to use the above to generate a new df that looks like the below:
#I've simplified it to a single condition and store
datetimeindex store working_hour_mean_sales
13/10/2018 07:27 gbn01 1
13/10/2018 08:27 gbn01 1
13/10/2018 09:27 gbn01 1
13/10/2018 10:27 gbn01 1
13/10/2018 11:27 gbn01 1
13/10/2018 12:27 gbn01 1
13/10/2018 13:27 gbn01 1
13/10/2018 14:27 gbn01 1
13/10/2018 15:27 gbn01 1
13/10/2018 16:27 gbn01 1
13/10/2018 17:27 gbn01 1
13/10/2018 18:27 gbn01 1
13/10/2018 19:27 gbn01 1
#Above weekday_mean_sales=1 because 13/10/2018 was a weekend
16/10/2018 07:27 gbn01 1
16/10/2018 08:27 gbn01 15.84
16/10/2018 09:27 gbn01 17.49
16/10/2018 10:27 gbn01 15.54
16/10/2018 11:27 gbn01 16.04
16/10/2018 12:27 gbn01 17
16/10/2018 13:27 gbn01 22.51
16/10/2018 14:27 gbn01 20.73
16/10/2018 15:27 gbn01 19.81
16/10/2018 16:27 gbn01 21.22
16/10/2018 17:27 gbn01 20.44
16/10/2018 18:27 gbn01 1
16/10/2018 19:27 gbn01 1
Where "working hours" are 08:00-18:00 Mon-Fri and "weekday lunch peak" is 12:00-13:30.
(N.B. I didn't make the counter-intuitive decision (at least to me) that weekday=0 means mon-fri)
Any suggestions how to implement this into pandas would be greatly appreciated!