0

I am trying to build a function that counts up zeros in a list until nonzero entry appears, and start counting from 0 again. For example,

>>> a
array([[ 0,  0,  1,  0,  2],
       [ 0,  0,  0,  1,  1],
       [ 0,  1,  0,  0,  0],
       [ 0,  0, 10,  2,  2],
       [ 2,  0,  0,  0,  0]])

In this case, my desired output will be

array([[1, 1, 0, 1, 0],
       [2, 2, 1, 0, 0],
       [3, 0, 2, 1, 1],
       [4, 1, 0, 0, 0],
       [0, 2, 1, 1, 1]])

I tried making this with two for-loops, but it is very slow on a very large data set. I was hoping I could find a way to vectorize this operation so that the time is O(n) instead of O(n^2). Any help will be appreciated!

8
  • 3
    Vectorization doesn't really affect the runtime complexity. And a simple for-loop should be O(n) in the size of the array. Commented Sep 26, 2017 at 22:53
  • if the operation goes for i in range(n): for j in range(m): then it is O(n^2), isn't it? Commented Sep 26, 2017 at 22:59
  • Vectorization only improves the constant factor, not the asymptotic complexity. Commented Sep 26, 2017 at 22:59
  • So it has to be O(n^2) no matter what? Commented Sep 26, 2017 at 23:01
  • 2
    @anyhoon it depends on what n is. Typically, we speak in terms of the total size of the array. So, in that case, no it is O(n). if n is the length of the first dimension then yes, that algorithm would be O(n^2), but typically, you are interested in the big-oh complexity in terms of the total size of the array. Commented Sep 26, 2017 at 23:07

2 Answers 2

1

Something like this might make it a little faster:

a = np.array([[ 0,  0,  1,  0,  2],
              [ 0,  0,  0,  1,  1],
              [ 0,  1,  0,  0,  0],
              [ 0,  0, 10,  2,  2],
              [ 2,  0,  0,  0,  0]])

b = (a == 0)
c = np.zeros_like(a)
c[0, :] += b[0, :]
for i in range(1, c.shape[1]):
    c[i, :] = b[i, :] * (1 + c[i-1, :])

Array 'c' gives the desired result.

Or optimizing a little further...

a = ...
b = (a == 0) * 1
for i in range(1, b.shape[1]):
    b[i, :] *= (1 + b[i-1, :])

Now 'b' is your result, and you've got one array less to deal with.

You'll notice that this algorithm still has the same time complexity as the 'two for loop' solution, but now one of those loops is being internalized by numpy, so I would expect a speedup on a large array.

Sign up to request clarification or add additional context in comments.

Comments

0

What about something like this, which is vectorized and has no for loops:

def moving_count(a, value, axis=0):
    """Return sequential counts of a given value along an axis"""
    if np.all(a == value):
        # Fill the output with counts along the desired axis
        return np.rollaxis(np.arange(1, a.size + 1).reshape(a.shape), axis)

    # Allocate output with a cumulative count of value along flattened axis
    output = np.cumsum(np.rollaxis(a, axis) == value)

    # Find locations of breakpoints
    breakpoints = (np.roll(output, 1) - output) == 0

    # Since breakpoints is boolean, argmax returns the location of the first breakpoint
    threshold = np.argmax(breakpoints)

    # Repeat the cumulative value along labels and subtract to reset the count at each breakpoint
    output[threshold:] -= output[breakpoints][np.cumsum(breakpoints) - 1][threshold:]

    # Reshape and return axis to match input array
    return np.rollaxis(output.reshape(a.shape), axis)

Applied to your problem:

In[3]: a = np.array([[ 0,  0,  1,  0,  2],
                     [ 0,  0,  0,  1,  1],
                     [ 0,  1,  0,  0,  0],
                     [ 0,  0, 10,  2,  2],
                     [ 2,  0,  0,  0,  0]])
In[4]: moving_count(a, 0, 1)
Out[4]: 

array([[1, 1, 0, 2, 0],
       [2, 2, 1, 0, 0],
       [3, 0, 2, 1, 1],
       [4, 1, 0, 0, 0],
       [0, 2, 1, 1, 1]])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.