Vectorized Counting in 2D array python

Question

I am trying to build a function that counts up zeros in a list until nonzero entry appears, and start counting from 0 again. For example,

>>> a
array([[ 0,  0,  1,  0,  2],
       [ 0,  0,  0,  1,  1],
       [ 0,  1,  0,  0,  0],
       [ 0,  0, 10,  2,  2],
       [ 2,  0,  0,  0,  0]])

In this case, my desired output will be

array([[1, 1, 0, 1, 0],
       [2, 2, 1, 0, 0],
       [3, 0, 2, 1, 1],
       [4, 1, 0, 0, 0],
       [0, 2, 1, 1, 1]])

I tried making this with two for-loops, but it is very slow on a very large data set. I was hoping I could find a way to vectorize this operation so that the time is O(n) instead of O(n^2). Any help will be appreciated!

Vectorization doesn't really affect the runtime complexity. And a simple for-loop should be O(n) in the size of the array. — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Sep 26, 2017 at 22:53
if the operation goes for i in range(n): for j in range(m): then it is O(n^2), isn't it? — anyhoon
– anyhoon, Commented Sep 26, 2017 at 22:59
Vectorization only improves the constant factor, not the asymptotic complexity. — user2357112
– user2357112, Commented Sep 26, 2017 at 22:59
@anyhoon it depends on what n is. Typically, we speak in terms of the total size of the array. So, in that case, no it is O(n). if n is the length of the first dimension then yes, that algorithm would be O(n^2), but typically, you are interested in the big-oh complexity in terms of the total size of the array. — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Sep 26, 2017 at 23:07

tvt173 · Accepted Answer · 2017-09-26 23:31:16Z

1

Something like this might make it a little faster:

a = np.array([[ 0,  0,  1,  0,  2],
              [ 0,  0,  0,  1,  1],
              [ 0,  1,  0,  0,  0],
              [ 0,  0, 10,  2,  2],
              [ 2,  0,  0,  0,  0]])

b = (a == 0)
c = np.zeros_like(a)
c[0, :] += b[0, :]
for i in range(1, c.shape[1]):
    c[i, :] = b[i, :] * (1 + c[i-1, :])

Array 'c' gives the desired result.

Or optimizing a little further...

a = ...
b = (a == 0) * 1
for i in range(1, b.shape[1]):
    b[i, :] *= (1 + b[i-1, :])

Now 'b' is your result, and you've got one array less to deal with.

You'll notice that this algorithm still has the same time complexity as the 'two for loop' solution, but now one of those loops is being internalized by numpy, so I would expect a speedup on a large array.

edited Sep 26, 2017 at 23:31

answered Sep 26, 2017 at 23:15

tvt173

1,90621 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Devin Cairns · Accepted Answer · 2017-09-27 22:55:03Z

What about something like this, which is vectorized and has no for loops:

def moving_count(a, value, axis=0):
    """Return sequential counts of a given value along an axis"""
    if np.all(a == value):
        # Fill the output with counts along the desired axis
        return np.rollaxis(np.arange(1, a.size + 1).reshape(a.shape), axis)

    # Allocate output with a cumulative count of value along flattened axis
    output = np.cumsum(np.rollaxis(a, axis) == value)

    # Find locations of breakpoints
    breakpoints = (np.roll(output, 1) - output) == 0

    # Since breakpoints is boolean, argmax returns the location of the first breakpoint
    threshold = np.argmax(breakpoints)

    # Repeat the cumulative value along labels and subtract to reset the count at each breakpoint
    output[threshold:] -= output[breakpoints][np.cumsum(breakpoints) - 1][threshold:]

    # Reshape and return axis to match input array
    return np.rollaxis(output.reshape(a.shape), axis)

Applied to your problem:

In[3]: a = np.array([[ 0,  0,  1,  0,  2],
                     [ 0,  0,  0,  1,  1],
                     [ 0,  1,  0,  0,  0],
                     [ 0,  0, 10,  2,  2],
                     [ 2,  0,  0,  0,  0]])
In[4]: moving_count(a, 0, 1)
Out[4]: 

array([[1, 1, 0, 2, 0],
       [2, 2, 1, 0, 0],
       [3, 0, 2, 1, 1],
       [4, 1, 0, 0, 0],
       [0, 2, 1, 1, 1]])

Collectives™ on Stack Overflow

Vectorized Counting in 2D array python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related