7

Consider a pandas DataFrame which looks like the one below

      A     B     C
0  0.63  1.12  1.73
1  2.20 -2.16 -0.13
2  0.97 -0.68  1.09
3 -0.78 -1.22  0.96
4 -0.06 -0.02  2.18

I would like to use the function .rolling() to perform the following calculation for t = 0,1,2:

  • Select the rows from t to t+2
  • Take the 9 values contained in those 3 rows, from all the columns. Call this set S
  • Compute the 75th percentile of S (or other summary statistics about S)


For instance, for t = 1 we have S = { 2.2 , -2.16, -0.13, 0.97, -0.68, 1.09, -0.78, -1.22, 0.96 } and the 75th percentile is 0.97.

I couldn't find a way to make it work with .rolling(), since it apparently takes each column separately. I'm now relying on a for loop, but it is really slow.

Do you have any suggestion for a more efficient approach?

5
  • what about this Commented Jul 31, 2018 at 14:34
  • 5
    You have enough rep know that not post the data as picture Commented Jul 31, 2018 at 14:35
  • @RushabhMehta: that's exactly the function I am trying to use, but I don't understand how pull the data from the 3 columns A,B and C together to compute the percentile. Commented Jul 31, 2018 at 14:39
  • 1
    Just paste it as text and format it as code. Read this: stackoverflow.com/questions/20109391/… Commented Jul 31, 2018 at 14:45
  • I don't like the answers involving reshaping things. I used scipy.ndimage.generic_filter to do this properly, though it needed some trial and error to get the same effect as Pandas' rolling's center=False effect. Commented Apr 19, 2020 at 19:25

2 Answers 2

4

One solution is to stack the data and then multiply your window size by the number of columns and slice the result by the number of columns. Also, since you want a forward looking window, reverse the order of the stacked DataFrame

wsize = 3
cols = len(df.columns)

df.stack(dropna=False)[::-1].rolling(window=wsize*cols).quantile(0.75)[cols-1::cols].reset_index(-1, drop=True).sort_index()

Output:

0    1.12
1    0.97
2    0.97
3     NaN
4     NaN
dtype: float64

In the case of many columns and a small window:

import pandas as pd
import numpy as np

wsize = 3
df2 = pd.concat([df.shift(-x) for x in range(wsize)], 1)
s_quant = df2.quantile(0.75, 1)

# Only necessary if you need to enforce sufficient data. 
s_quant[df2.isnull().any(1)] = np.NaN

Output: s_quant

0    1.12
1    0.97
2    0.97
3     NaN
4     NaN
Name: 0.75, dtype: float64
Sign up to request clarification or add additional context in comments.

6 Comments

This is not equivalent, as it would compute the statistics many more times. This is because the window would move every 1 entry and not every 3. In my real dataset I have many more columns than 3, so this actually makes a huge difference computationally speaking.
@Abramodj I find it takes 8x longer for 1000x the number of columns. So as long as you have a reasonable number of columns, the excess probably wont kill you. It wasn't clear that you needed an extremely performant solution over just a solution to your problem. Let me see what I can do.
Actually, I have more than 10k columns
About one thousand, so the for loop is actually feasible in this case. I just wanted to understand if this can be done, as it could be very useful in other circumstances.
@ALollz I think you should specify dropna=True in stack, otherwise the window size would be consistent
|
0

You can use numpy ravel. Still you may have to use for loops.

for i in range(0,3):
    print(df.iloc[i:i+3].values.ravel())

If your t steps in 3s, you can use numpy reshape function to create a n*9 dataframe.

2 Comments

This is the solution I am using at the moment, which requires a for loop. It is not as fast as rolling because it is not vectorized
@Abramodj Have you tried this: x = pd.concat([df, df.shift(-1),df.shift(-2)], axis=1) then x['m'] = x.mean(axis=1). For quartile you can use x['q3'] = x.quantile(0.75, axis=1).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.