0

I have a df that has a column of lists.

Python Pandas rolling aggregate a column of lists

import pandas as pd
import numpy as np
# Get some time series data
df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/timeseries.csv")
input_cols = ['A', 'B']
df['single_input_vector'] = df[input_cols].apply(tuple, axis=1).apply(list)

I am wondering if there is a way to create a rolling aggregate of the 'single_input_vector' column for a given window. I looked at the following SO link but it does not provide a way to include a window. In my case, the desired output column for a window of 3 would be:

Row1: [[24.68, 164.93]] 
Row2: [[24.68, 164.93], [24.18, 164.89]]
Row3: [[24.68, 164.93], [24.18, 164.89], [23.99, 164.63]] 
Row4: [[24.18, 164.89], [23.99, 164.63], [24.14, 163.92]]

and so on.

1 Answer 1

1

I can't think of a more efficient way to do this, so while this does work there may be performance constraints on massive data sets.

We are basically using rolling count to create a start:stop set of slicing indices.

import pandas as pd
import numpy as np
# Get some time series data
df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/timeseries.csv")
input_cols = ['A', 'B']
df['single_input_vector'] = df[input_cols].apply(tuple, axis=1).apply(list)


window = 3

df['len'] = df['A'].rolling(window=window).count()

df['vector_list'] = df.apply(lambda x: df['single_input_vector'][max(0,x.name-(window-1)):int(x.name)+1].values, axis=1)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.