0

please excuse me if this (or something similar) has already been asked.

I've got a numpy structured numpy array with > 1E7 entries. Now one of the columns o the array is the timestamp of a specific event. What I'd like to do is filter the array based on timestamps. I'd like to keep the N'th row if the N+1 row's timestamp is larger than the previous entry by T. Is there an efficient way to do this in numpy? I've been going about it in the following way, but it's too slow to be useful (y is the structured array filled with all of our data. x is the filtered array)

   T=250
   x=np.ndarray(len(y),dtype=y.dtype)
   for i in range(len(y['timestamp'])-1):
       if y['timestamp'][i+1]-y['timestamp'][i]>T:
           x[i]=y[i]

1 Answer 1

1

This is a good example of using advanced indexing in numpy:

this_row = y['timestamp'][:-1]
next_row = y['timestamp'][1:]
selection = next_row - this_row > T
result = y[:-1][selection]

The y[:-1] in the last line is necessary because selection has only length len(y) - 1 and the last element should be dropped always according to your code. Alternatively, you could also concatenate another False to selection, but this might be slower since it necessitates copying the values of selection. But if performance is really an issue, you should benchmark these two options.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! I just went ahead and dropped the last element like you have in the above code block. Much faster and more straightforward

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.