3

I'm using a Pandas dataframe indexed by datetimes that looks something like this:

TimeSys_Index
2014-08-29 00:00:18    0
2014-08-29 00:00:19    0
2014-08-29 00:00:20    1
2014-08-29 00:00:21    1
2014-08-29 00:00:22    0
2014-08-29 00:00:23    0
2014-08-29 00:00:24    0
2014-08-29 00:00:25    0
2014-08-29 00:00:26    0
2014-08-29 00:00:27    1
2014-08-29 00:00:28    1
2014-08-29 00:00:29    1
2014-08-29 00:00:30    1
2014-08-29 00:00:31    0
2014-08-29 00:00:32    0
2014-08-29 00:00:33    0
...

I want to find the index (time) for every occurrence of the pattern [0, 0, 1, 1]. Using the above sequence I'd like it to return ['2014-08-29 00:00:18', '2014-08-29 00:00:25']. The kicker is this needs to be vectorized or at least very quick.

I was thinking of running a correlation of the full vector with the pattern vector and finding the indices where the resulting vector equals 4, but there's got to be a simpler way.

1 Answer 1

3

You can look at the shifted values:

>>> df.head()
                     val
TimeSys_Index           
2014-08-29 00:00:18    0
2014-08-29 00:00:19    0
2014-08-29 00:00:20    1
2014-08-29 00:00:21    1
2014-08-29 00:00:22    0
>>> i = (df['val'] == 0) & (df['val'].shift(-1) == 0)
>>> i &= (df['val'].shift(-2) == 1) & (df['val'].shift(-3) == 1)
>>> df.index[i]
<class 'pandas.tseries.index.DatetimeIndex'>
[2014-08-29 00:00:18, 2014-08-29 00:00:25]
Length: 2, Freq: None, Timezone: None
Sign up to request clarification or add additional context in comments.

1 Comment

I'm trying to use this solution for a non-datetime indexed dataset. When I run this as you've posted I get " 'RangeIndex' object is not callable." - Any suggestions? Thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.