In a number of aggregation function, such as rolling_mean, rolling_max, rolling_min, etc, the input argument window_size is supposed to be of type int
I am wondering how to efficiently compute results when having a list of window_size.
Consider the following dataframe:
import polars as pl
pl.Config(tbl_rows=-1)
df = pl.DataFrame(
{
"symbol": ["A", "A", "A", "A", "A", "B", "B", "B", "B"],
"price": [100, 110, 105, 103, 107, 200, 190, 180, 185],
}
)
shape: (9, 2)
┌────────┬───────┐
│ symbol ┆ price │
│ --- ┆ --- │
│ str ┆ i64 │
╞════════╪═══════╡
│ A ┆ 100 │
│ A ┆ 110 │
│ A ┆ 105 │
│ A ┆ 103 │
│ A ┆ 107 │
│ B ┆ 200 │
│ B ┆ 190 │
│ B ┆ 180 │
│ B ┆ 185 │
└────────┴───────┘
Let's say I have a list with n elements, such as periods = [2, 3]. I am looking for a solution to compute the rolling means for all periods grouped by symbol in parallel. Speed and memory efficiency is of the essence.
The result should be a tidy/long dataframe like this:
shape: (18, 4)
┌────────┬───────┬─────────────┬──────────────┐
│ symbol ┆ price ┆ mean_period ┆ rolling_mean │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ u8 ┆ f64 │
╞════════╪═══════╪═════════════╪══════════════╡
│ A ┆ 100 ┆ 2 ┆ null │
│ A ┆ 110 ┆ 2 ┆ 105.0 │
│ A ┆ 105 ┆ 2 ┆ 107.5 │
│ A ┆ 103 ┆ 2 ┆ 104.0 │
│ A ┆ 107 ┆ 2 ┆ 105.0 │
│ B ┆ 200 ┆ 2 ┆ null │
│ B ┆ 190 ┆ 2 ┆ 195.0 │
│ B ┆ 180 ┆ 2 ┆ 185.0 │
│ B ┆ 185 ┆ 2 ┆ 182.5 │
│ A ┆ 100 ┆ 3 ┆ null │
│ A ┆ 110 ┆ 3 ┆ null │
│ A ┆ 105 ┆ 3 ┆ 105.0 │
│ A ┆ 103 ┆ 3 ┆ 106.0 │
│ A ┆ 107 ┆ 3 ┆ 105.0 │
│ B ┆ 200 ┆ 3 ┆ null │
│ B ┆ 190 ┆ 3 ┆ null │
│ B ┆ 180 ┆ 3 ┆ 190.0 │
│ B ┆ 185 ┆ 3 ┆ 185.0 │
└────────┴───────┴─────────────┴──────────────┘
mean_periodis always numeric. The unpivot approach introduces a String column which needs casting.