Apply multiple window sizes to rolling aggregation functions in polars dataframe

Question

In a number of aggregation function, such as rolling_mean, rolling_max, rolling_min, etc, the input argument window_size is supposed to be of type int

I am wondering how to efficiently compute results when having a list of window_size.

Consider the following dataframe:

import polars as pl

pl.Config(tbl_rows=-1)

df = pl.DataFrame(
    {
        "symbol": ["A", "A", "A", "A", "A", "B", "B", "B", "B"],
        "price": [100, 110, 105, 103, 107, 200, 190, 180, 185],
    }
)

shape: (9, 2)
┌────────┬───────┐
│ symbol ┆ price │
│ ---    ┆ ---   │
│ str    ┆ i64   │
╞════════╪═══════╡
│ A      ┆ 100   │
│ A      ┆ 110   │
│ A      ┆ 105   │
│ A      ┆ 103   │
│ A      ┆ 107   │
│ B      ┆ 200   │
│ B      ┆ 190   │
│ B      ┆ 180   │
│ B      ┆ 185   │
└────────┴───────┘

Let's say I have a list with n elements, such as periods = [2, 3]. I am looking for a solution to compute the rolling means for all periods grouped by symbol in parallel. Speed and memory efficiency is of the essence.

The result should be a tidy/long dataframe like this:

shape: (18, 4)
┌────────┬───────┬─────────────┬──────────────┐
│ symbol ┆ price ┆ mean_period ┆ rolling_mean │
│ ---    ┆ ---   ┆ ---         ┆ ---          │
│ str    ┆ i64   ┆ u8          ┆ f64          │
╞════════╪═══════╪═════════════╪══════════════╡
│ A      ┆ 100   ┆ 2           ┆ null         │
│ A      ┆ 110   ┆ 2           ┆ 105.0        │
│ A      ┆ 105   ┆ 2           ┆ 107.5        │
│ A      ┆ 103   ┆ 2           ┆ 104.0        │
│ A      ┆ 107   ┆ 2           ┆ 105.0        │
│ B      ┆ 200   ┆ 2           ┆ null         │
│ B      ┆ 190   ┆ 2           ┆ 195.0        │
│ B      ┆ 180   ┆ 2           ┆ 185.0        │
│ B      ┆ 185   ┆ 2           ┆ 182.5        │
│ A      ┆ 100   ┆ 3           ┆ null         │
│ A      ┆ 110   ┆ 3           ┆ null         │
│ A      ┆ 105   ┆ 3           ┆ 105.0        │
│ A      ┆ 103   ┆ 3           ┆ 106.0        │
│ A      ┆ 107   ┆ 3           ┆ 105.0        │
│ B      ┆ 200   ┆ 3           ┆ null         │
│ B      ┆ 190   ┆ 3           ┆ null         │
│ B      ┆ 180   ┆ 3           ┆ 190.0        │
│ B      ┆ 185   ┆ 3           ┆ 185.0        │
└────────┴───────┴─────────────┴──────────────┘

I think @RomanPekar's answer may be more memory efficient as mean_period is always numeric. The unpivot approach introduces a String column which needs casting. — jqurious
– jqurious, Commented Sep 9, 2024 at 9:21

roman · Accepted Answer · 2024-09-06 12:05:58Z

You can use comprehension to generate a DataFrame for each value in periods list and then concat() DataFrames into single long DataFrame:

periods = [2, 3]

pl.concat(
    df.with_columns(
        mean_period = pl.lit(p),
        rolling_mean = pl.col.price.rolling_mean(p).over("symbol")
    )
    for p in periods
)

┌────────┬───────┬─────────────┬──────────────┐
│ symbol ┆ price ┆ mean_period ┆ rolling_mean │
│ ---    ┆ ---   ┆ ---         ┆ ---          │
│ str    ┆ i64   ┆ i32         ┆ f64          │
╞════════╪═══════╪═════════════╪══════════════╡
│ A      ┆ 100   ┆ 2           ┆ null         │
│ A      ┆ 110   ┆ 2           ┆ 105.0        │
│ A      ┆ 105   ┆ 2           ┆ 107.5        │
│ A      ┆ 103   ┆ 2           ┆ 104.0        │
│ A      ┆ 107   ┆ 2           ┆ 105.0        │
│ B      ┆ 200   ┆ 2           ┆ null         │
│ B      ┆ 190   ┆ 2           ┆ 195.0        │
│ B      ┆ 180   ┆ 2           ┆ 185.0        │
│ B      ┆ 185   ┆ 2           ┆ 182.5        │
│ A      ┆ 100   ┆ 3           ┆ null         │
│ A      ┆ 110   ┆ 3           ┆ null         │
│ A      ┆ 105   ┆ 3           ┆ 105.0        │
│ A      ┆ 103   ┆ 3           ┆ 106.0        │
│ A      ┆ 107   ┆ 3           ┆ 105.0        │
│ B      ┆ 200   ┆ 3           ┆ null         │
│ B      ┆ 190   ┆ 3           ┆ null         │
│ B      ┆ 180   ┆ 3           ┆ 190.0        │
│ B      ┆ 185   ┆ 3           ┆ 185.0        │
└────────┴───────┴─────────────┴──────────────┘

jqurious · Accepted Answer · 2024-09-06 12:07:10Z

You can generate multiple expressions with a comprehension:

df.with_columns(
    pl.col("price").rolling_mean(p).over("symbol").alias(f"{p}") 
    for p in periods
)

shape: (9, 4)
┌────────┬───────┬───────┬───────┐
│ symbol ┆ price ┆ 2     ┆ 3     │
│ ---    ┆ ---   ┆ ---   ┆ ---   │
│ str    ┆ i64   ┆ f64   ┆ f64   │
╞════════╪═══════╪═══════╪═══════╡
│ A      ┆ 100   ┆ null  ┆ null  │
│ A      ┆ 110   ┆ 105.0 ┆ null  │
│ A      ┆ 105   ┆ 107.5 ┆ 105.0 │
│ A      ┆ 103   ┆ 104.0 ┆ 106.0 │
│ A      ┆ 107   ┆ 105.0 ┆ 105.0 │
│ B      ┆ 200   ┆ null  ┆ null  │
│ B      ┆ 190   ┆ 195.0 ┆ null  │
│ B      ┆ 180   ┆ 185.0 ┆ 190.0 │
│ B      ┆ 185   ┆ 182.5 ┆ 185.0 │
└────────┴───────┴───────┴───────┘

You can then reshape with .unpivot() and name/cast the columns.

(
    df.with_columns(
        pl.col("price").rolling_mean(p).over("symbol").alias(f"{p}") 
        for p in periods
    )
    .unpivot(
        index=["symbol", "price"], 
        variable_name="mean_period", 
        value_name="rolling_mean"
    )
    .with_columns(pl.col("mean_period").cast(pl.UInt8))
)

shape: (18, 4)
┌────────┬───────┬─────────────┬──────────────┐
│ symbol ┆ price ┆ mean_period ┆ rolling_mean │
│ ---    ┆ ---   ┆ ---         ┆ ---          │
│ str    ┆ i64   ┆ u8          ┆ f64          │
╞════════╪═══════╪═════════════╪══════════════╡
│ A      ┆ 100   ┆ 2           ┆ null         │
│ A      ┆ 110   ┆ 2           ┆ 105.0        │
│ A      ┆ 105   ┆ 2           ┆ 107.5        │
│ A      ┆ 103   ┆ 2           ┆ 104.0        │
│ A      ┆ 107   ┆ 2           ┆ 105.0        │
│ …      ┆ …     ┆ …           ┆ …            │
│ A      ┆ 107   ┆ 3           ┆ 105.0        │
│ B      ┆ 200   ┆ 3           ┆ null         │
│ B      ┆ 190   ┆ 3           ┆ null         │
│ B      ┆ 180   ┆ 3           ┆ 190.0        │
│ B      ┆ 185   ┆ 3           ┆ 185.0        │
└────────┴───────┴─────────────┴──────────────┘

Collectives™ on Stack Overflow

Apply multiple window sizes to rolling aggregation functions in polars dataframe

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related