Numpy vectorization with calculation depending on previous value(s)

Question

Is there is a vectorized way to change all concurrent 1s that are within offset of the first 1 into 0s (transform A into B)? I'm currently trying to do this on a numpy array with over 1 million items where speed is critical.

The 1s represent a signal trigger and the 0s represent no trigger. For example: Given an offset of 5, whenever there is a 1, the following 5 items must be 0 (to remove signal concurrency).

Example 1:

offset = 3
A = np.array([1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0])
B = np.array([1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0])

Example 2:

offset = 2
A = np.array([1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0])
B = np.array([1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0])

I'm sorry I don't understand exactly how you're going from A to B — CJR
– CJR, Commented Dec 9, 2021 at 22:22
The position of the first 1 after a 0 is np.argwhere(np.diff(np.pad(A,1)) == 1).squeeze(). But that doesn't really help for the question at hand, as you also want to count the 1s after newly created 0s. — JohanC
– JohanC, Commented Dec 9, 2021 at 22:57
"vectorization" is easiest when the action is 'parallel' - same action on all elements regardless of order. numpy methods don't implement a sequential actions except for limited cases like cumsum and similar ufunc.accumulate. It's not that sequential actions can't be coded, but that it's harder to create building blocks that can be applied one after the other. At some point you may need to resort to a compiling tool like numba. — hpaulj
– hpaulj, Commented Dec 9, 2021 at 23:14
I'd start with a list based solution. With a better understanding of the task, you might see pattern(s) that allow you express in a 'multidimensional' way. But with sequences that differ in length and spacing that may be difficult. It may be easier to speed up with numba than trying to squeeze out some sort of whole-array numpy solution. — hpaulj
– hpaulj, Commented Dec 10, 2021 at 2:02

Ali_Sh · Accepted Answer · 2021-12-11 11:18:01Z

From the comments, it seems that, the question is not just related to use NumPy and …, and the main objective is to speed up the code. Since, you are using the partial solution, mentioned by JohanC, (Which needs much more considerations for this question), I suggest the following methods:

def com_():
    n = 1
    for i in range(1, len(A)+1):
        if A[n-1] == 1:
            A[n:n+offset] = 0
            n += offset + 1
        else:
            n += 1
        if n > len(A):
            break


@nb.jit(forceobj=True)
def com_fast(): 
    B = A.tolist()
    n = 1
    while n < len(B):
        if B[n-1] == 1:
            for i in range(offset):
                if n+i < len(B):
                    B[n+i] = 0
            n += offset + 1
        else:
            n += 1

The first method is using A in the form of NumPy array and loops. The second one uses an input in the form of list and loops, and is accelerated by numba as it is mentioned by hpaulj in the comments.
Using the same inputs (1,000,000 in length) for the methods, and running on Google Colab TPU:

1000 loops, best of 5: 153 ms per loop         # for com_()
1000 loops, best of 5: 10.2 ms per loop        # for com_fast()

Which, I think, will show acceptable performance times with that large data. I think, this question could not be solved just by NumPy, or if so, It will be very difficult and need to think about it a lot (I have tried and I achieved good results, but finally needs to loops). My guess is that, using numba and libraries like that, could have similar results (in runtime) and, so, it does not need to use just NumPy.

Collectives™ on Stack Overflow

Numpy vectorization with calculation depending on previous value(s)

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related