Numpy: split array into parts according to sequence of values

Question

What i have is a big numpy one-dimensional np.int16 array with data and one boolean array, which stores information whether a particular sample (wich is samplesize long) of data fits some criteria (is valid) or don't fits (is not valid). I mean i have something like this:

samplesize = 5
data = array([1, 2, 3, 4, 5, 3, 2, 1, 3, 2, 4, 5, 2, 1, 1], dtype=int16) 
membership = array([False, True, False], dtype=bool)

Here membership[0] identifies whether data[ 0*samplesize : 1*samplesize ] is valid.

What i want is to split data array into chunks according to sequence of True values in membership array. For example, if membership contains three or more successive True statement then the decision is made, that it is meaningful sample of data.

Example

True, True, True , True - valid sequence 
True, True, False, True , True - invalid sequece

Assuming we have identified start of i-th valid sequence as start[i] and end of such a sequence as end[i], i want to split an data array into pieces which start from start[i] * samplesize and last to end[i] * samplesize.

How could i accomplish this ?

What have you already tried with np.split and where's your problem? — sebix
– sebix, Commented Nov 30, 2014 at 10:16
I could not use np.split because it could only split by the list of known indexes. I need to find edges for splitting by analyzing the membership array, and that is the question - how to find these start and end indexes of successive True statements. — xolodec
– xolodec, Commented Nov 30, 2014 at 10:28
I could not also use condition spliting. I thought about itertools.groupby but i am curious if there could be more efficient solutions. — xolodec
– xolodec, Commented Nov 30, 2014 at 10:34

HYRY · Accepted Answer · 2014-11-30 11:22:31Z

2

I don't understand your question. Do you want to get start & end index of membership with 3 or more successive True?

Here is the code to do that, the basic idea is to diff(membership), and get the index of rising edge and falling edge:

import numpy as np
membership = np.random.randint(0, 2, 100)
d = np.diff(np.r_[0, membership, 0])
start = np.where(d == 1)[0]
end = np.where(d == -1)[0]
mask = (end - start) >= 3
start = start[mask]
end = end[mask]

for s, e in zip(start, end):
    print s, e, membership[s:e]

answered Nov 30, 2014 at 11:22

HYRY

97.8k28 gold badges197 silver badges192 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

xolodec Over a year ago

Thank you. I didn't know that mask = (end - start) >= 3 is possible. Thanks a lot. I was looking exactly for such a vectorizing calculation approach.

Collectives™ on Stack Overflow

Numpy: split array into parts according to sequence of values

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related