2

What i have is a big numpy one-dimensional np.int16 array with data and one boolean array, which stores information whether a particular sample (wich is samplesize long) of data fits some criteria (is valid) or don't fits (is not valid). I mean i have something like this:

samplesize = 5
data = array([1, 2, 3, 4, 5, 3, 2, 1, 3, 2, 4, 5, 2, 1, 1], dtype=int16) 
membership = array([False, True, False], dtype=bool)

Here membership[0] identifies whether data[ 0*samplesize : 1*samplesize ] is valid.

What i want is to split data array into chunks according to sequence of True values in membership array. For example, if membership contains three or more successive True statement then the decision is made, that it is meaningful sample of data.

Example

True, True, True , True - valid sequence 
True, True, False, True , True - invalid sequece

Assuming we have identified start of i-th valid sequence as start[i] and end of such a sequence as end[i], i want to split an data array into pieces which start from start[i] * samplesize and last to end[i] * samplesize.

How could i accomplish this ?

4
  • What have you already tried with np.split and where's your problem? Commented Nov 30, 2014 at 10:16
  • I could not use np.split because it could only split by the list of known indexes. I need to find edges for splitting by analyzing the membership array, and that is the question - how to find these start and end indexes of successive True statements. Commented Nov 30, 2014 at 10:28
  • I could not also use condition spliting. I thought about itertools.groupby but i am curious if there could be more efficient solutions. Commented Nov 30, 2014 at 10:34
  • What about applying np.diff to membership? Commented Nov 30, 2014 at 11:29

1 Answer 1

2

I don't understand your question. Do you want to get start & end index of membership with 3 or more successive True?

Here is the code to do that, the basic idea is to diff(membership), and get the index of rising edge and falling edge:

import numpy as np
membership = np.random.randint(0, 2, 100)
d = np.diff(np.r_[0, membership, 0])
start = np.where(d == 1)[0]
end = np.where(d == -1)[0]
mask = (end - start) >= 3
start = start[mask]
end = end[mask]

for s, e in zip(start, end):
    print s, e, membership[s:e]
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you. I didn't know that mask = (end - start) >= 3 is possible. Thanks a lot. I was looking exactly for such a vectorizing calculation approach.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.