2

I have this data in pandas

data.tail(15)
                       open    high     low   close        vwap
date                                                           
2018-11-20 18:45:00  176.73  176.95  176.54  176.89  176.582983
2018-11-20 18:46:00  176.89  177.02  176.81  176.81  176.603020
2018-11-20 18:47:00  176.80  176.80  176.43  176.43  176.612706
2018-11-20 18:48:00  176.45  176.46  176.21  176.21  176.599967
2018-11-20 18:49:00  176.22  176.32  176.14  176.26  176.586624
2018-11-20 18:50:00  176.26  176.38  176.23  176.28  176.577114
2018-11-20 18:51:00  176.31  176.43  176.20  176.20  176.562641
2018-11-20 18:52:00  176.22  176.25  176.15  176.18  176.544664
2018-11-20 18:53:00  176.19  176.19  175.97  176.00  176.506937
2018-11-20 18:54:00  176.00  176.30  175.97  176.30  176.493768
2018-11-20 18:55:00  176.29  176.92  176.11  176.91  176.518353
2018-11-20 18:56:00  176.92  177.03  176.67  176.76  176.554964
2018-11-20 18:57:00  176.78  176.89  176.74  176.76  176.566201
2018-11-20 18:58:00  176.77  176.87  176.56  176.65  176.571326
2018-11-20 18:59:00  176.65  177.17  176.59  176.94  176.681413

And I need get sub dataframe grouped by 5 like:

1: 
2018-11-20 18:45:00  176.73  176.95  176.54  176.89  176.582983
2018-11-20 18:46:00  176.89  177.02  176.81  176.81  176.603020
2018-11-20 18:47:00  176.80  176.80  176.43  176.43  176.612706
2018-11-20 18:48:00  176.45  176.46  176.21  176.21  176.599967
2018-11-20 18:49:00  176.22  176.32  176.14  176.26  176.586624

2: 
2018-11-20 18:46:00  176.89  177.02  176.81  176.81  176.603020
2018-11-20 18:47:00  176.80  176.80  176.43  176.43  176.612706
2018-11-20 18:48:00  176.45  176.46  176.21  176.21  176.599967
2018-11-20 18:49:00  176.22  176.32  176.14  176.26  176.586624
2018-11-20 18:50:00  176.26  176.38  176.23  176.28  176.577114

The shift is 1 minute.

n: 
2018-11-20 18:55:00  176.29  176.92  176.11  176.91  176.518353
2018-11-20 18:56:00  176.92  177.03  176.67  176.76  176.554964
2018-11-20 18:57:00  176.78  176.89  176.74  176.76  176.566201
2018-11-20 18:58:00  176.77  176.87  176.56  176.65  176.571326
2018-11-20 18:59:00  176.65  177.17  176.59  176.94  176.681413

How to do this? I tryied rolling, groupby without success.

pandas 0.23.4
Python 3.6.3

Thanks

1
  • Are the time differences guaranteed to be one minute? What would you like to do with the subsets of the dataframe? Commented Nov 22, 2018 at 14:43

1 Answer 1

1

The following results in the requested output (pandas 0.22.0, python 3.6.7):

import pandas as pd
from datetime import timedelta

# Width of the time window: 5min
dt = timedelta(minutes=5)
# Step of the sliding window: 1min
step = timedelta(minutes=1)

start = df.index[0]
stop = df.index[-1]
while start <= (stop-dt+step):
    idx = (start <= df.index) & (df.index < start+dt)
    start += step
    print(df[idx])
    print()

One can specify two parameters: the width dt of the time window and the step by which to move forward the "sliding window".

An advantage of this approach is that one operates with indices only, avoiding unnecessary copies of overlapping data (though I bet that python/pandas make a good job in avoiding this as much as possible, in case someone finds an alternative way to accomplish the job).

I tested with the following dataframe:

df = pd.DataFrame([["2018-11-20 18:45:00",  176.73,  176.95,  176.54,  176.89,  176.582983],
                   ["2018-11-20 18:46:00",  176.89,  177.02,  176.81,  176.81,  176.603020],
                   ["2018-11-20 18:47:00",  176.80,  176.80,  176.43,  176.43,  176.612706],
                   ["2018-11-20 18:48:00",  176.45,  176.46,  176.21,  176.21,  176.599967],
                   ["2018-11-20 18:49:00",  176.22,  176.32,  176.14,  176.26,  176.586624],
                   ["2018-11-20 18:50:00",  176.26,  176.38,  176.23,  176.28,  176.577114],
                   ["2018-11-20 18:51:00",  176.31,  176.43,  176.20,  176.20,  176.562641],
                   ["2018-11-20 18:52:00",  176.22,  176.25,  176.15,  176.18,  176.544664],
                   ["2018-11-20 18:53:00",  176.19,  176.19,  175.97,  176.00,  176.506937],
                   ["2018-11-20 18:54:00",  176.00,  176.30,  175.97,  176.30,  176.493768],
                   ["2018-11-20 18:55:00",  176.29,  176.92,  176.11,  176.91,  176.518353],
                   ["2018-11-20 18:56:00",  176.92,  177.03,  176.67,  176.76,  176.554964],
                   ["2018-11-20 18:57:00",  176.78,  176.89,  176.74,  176.76,  176.566201],
                   ["2018-11-20 18:58:00",  176.77,  176.87,  176.56,  176.65,  176.571326],
                   ["2018-11-20 18:59:00",  176.65,  177.17,  176.59,  176.94,  176.681413],],
                  columns=["date", "open", "high", "low", "close", "vwap"])
df = df.set_index("date")
df.index = pd.to_datetime(df.index)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.