I have a Pandas DataFrame with sales data and columns for year, ISO week, price, quantity, and organic [boolean]. Because each row represents a different location, dates are repeated. I would like to combine rows with matching year, ISO week, and organic. Ideally, the combined row would have the average price and sum of total quantity. Any help is much appreciated!
$\begingroup$
$\endgroup$
1
1 Answer
$\begingroup$
$\endgroup$
4
I believe what you need is agg from pandas. You can pass a dictionary of the different aggregations you need for each column:
import pandas as pd
df = pd.DataFrame({'year':['2017','2018','2019','2019'],
'ISO Week':[1,2,3,3],
'Price':[5,10,15,20],
'quantity':[1,2,3,4],
'organic':[True, False, True, True]})
ISO Week Price organic quantity year
0 1 5 True 1 2017
1 2 10 False 2 2018
2 3 15 True 3 2019 #<------ combine
3 3 20 True 4 2019 #<------ combine
df.groupby(['year','ISO Week','organic'], as_index=False).agg({'Price':'mean', 'quantity':'sum'})
year ISO Week organic Price quantity
0 2017 1 True 5.0 1
1 2018 2 False 10.0 2
2 2019 3 True 17.5 7
-
$\begingroup$ When adding a new data with having the same
yearas in the extant dataframe, the code doesn't work properly. for instance when I add another year '2018' it would be: ` year ISO Week organic Price quantity`0 2017 1 True 5 11 2018 2 False 10 2` 2 2018 4 True 17 5`3 2019 3 False 20 44 2019 3 True 15 3$\endgroup$Fatemeh Asgarinejad– Fatemeh Asgarinejad2019-06-07 20:46:29 +00:00Commented Jun 7, 2019 at 20:46 -
$\begingroup$ @Fatemehhh, I don't quite understand what you are saying. the format in comments isn't that nice. $\endgroup$MattR– MattR2019-06-07 20:54:58 +00:00Commented Jun 7, 2019 at 20:54
-
$\begingroup$ I'm so sorry for the format. I couldn't fix it. just add another row like '2018', 1, 20, 5, False to your dataframe. then in the result, the dataframe is not grouped by year $\endgroup$Fatemeh Asgarinejad– Fatemeh Asgarinejad2019-06-07 23:15:15 +00:00Commented Jun 7, 2019 at 23:15
-
1$\begingroup$ Right, because it's being grouped by more than just year, like the OP asked. If you want to group by just year, only use year in the groupby :) $\endgroup$MattR– MattR2019-06-07 23:16:19 +00:00Commented Jun 7, 2019 at 23:16
df.head()$\endgroup$