what is the best way to aggregate values based on a particular over partition by :
SQL :
select
a.*,
b.vol1 / sum(vol1) over (
partition by a.sale, a.d_id,
a.month, a.p_id
) vol_r,
a.vol2* b.vol1/ sum(b.vol1) over (
partition by a.sale, a.d_id,
a.month, a.p_id
) vol_t
from
sales1 a
left join sales2 b on a.sale = b.sale
and a.d_id = b.d_id
and a.month = b.month
and a.p_id = b.p_id
what would be the equivalent of this to pandas python?
Input :
sales1 :
| sale | d_id | month | p_id | vol2 |
|---|---|---|---|---|
| 2 | 580 | 4 | 9 | 11 |
| 2 | 580 | 4 | 9 | 11.314 |
| 2 | 580 | 4 | 9 | 20.065 |
sales2 :
| sale | d_id | month | p_id | vol1 |
|---|---|---|---|---|
| 2 | 580 | 4 | 9 | 11 |
| 2 | 580 | 4 | 9 | 11.314 |
| 2 | 580 | 4 | 9 | 21 |
output :
| sale | d_id | month | p_id | vol1 | vol2 | vol_r | vol_t |
|---|---|---|---|---|---|---|---|
| 2 | 580 | 4 | 9 | 11 | 11 | 1 | 11 |
| 2 | 580 | 4 | 9 | 11.314 | 11.314 | 1 | 11.314 |
| 2 | 580 | 4 | 9 | 21 | 20.065 | 1 | 20.065 |
df.to_dict('records')for both dataframes