4

I have a numpy binary array like this:

   Array A = [1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0]

I would like to count how many 0s are there at the left of each 1, and return it in an other array that would look like this for this for this example:

nb_1s = [0, 0, 1, 2, 2, 5]

There are no 0s at the left for the two first 1s so the the first two numbers of the array are 0 etc...

I know that first I have to initiate an array with number of 1s in my array:

def give_zeros(binary_array):
    binary_array = np.asarray(binary_array)
    nb_zeros = np.zeros(binary_array.sum())


    return nb_zeros

But I'm not sure on how to count the number of zeros. Should I iterate in a for loop with 'nditer'? It doesn't seem efficient as i will have to run this function on very large arrays.

Do you have any ideas? Thank you.

3
  • Shouldn't that be : [0, 0, 1, 1, 0, 3]? Commented Oct 19, 2017 at 8:59
  • Is the count cummulative? Commented Oct 19, 2017 at 9:00
  • Yes the count is cumulative, the last 1 has 3 zeros at its left adjacent plus the 2 zeros after the 1s Commented Oct 19, 2017 at 9:05

4 Answers 4

4

Code

You could use:

(A == 0).cumsum()[A > 0]
# array([0, 0, 1, 2, 2, 5])

or:

(~A).cumsum()[A]
# array([0, 0, 1, 2, 2, 5])

if A is a bool array.

Explanation

A == 0 is a boolean array which is True for each 0:

>>> import numpy as np
>>> A = np.array([1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0])
>>> A == 0
array([False, False,  True, False,  True, False, False,  True,  True,
        True, False,  True,  True,  True,  True], dtype=bool)

You can use cumsum() to count the number of Trues:

>>> (A == 0).cumsum()
array([0, 0, 1, 1, 2, 2, 2, 3, 4, 5, 5, 6, 7, 8, 9])

You only need the values where A > 0:

>>> (A == 0).cumsum()[A > 0]
array([0, 0, 1, 2, 2, 5])

Done!

Sign up to request clarification or add additional context in comments.

4 Comments

This seems like the most pythonic, if not the most efficient solution
@DanielF: Thanks. Indeed, the accepted answer is 20% faster than my code.
As usual, there's the wrong way, the right way, and the @Divakar way - which is sort of like the right way but with more code but somehow faster.
@EricDuminil Added timings on a bigger dataset in my post. Using boolean array conversion to feed into flatnonzero for much faster performance. The speedup over yours seems to be around 3x now.
3

Here's a vectorized way with differentiation of range array from the indices of 1s -

def leftzeros_count(a):
    idx = np.flatnonzero(a!=0)
    return idx - np.arange(len(idx))

Sample runs -

In [298]: a = np.array([1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0])

In [299]: leftzeros_count(a)
Out[299]: array([0, 0, 1, 2, 2, 5])

In [300]: a = np.array([0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0])

In [301]: leftzeros_count(a)
Out[301]: array([1, 1, 2, 3, 3, 6])

In [302]: a = np.array([0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1])

In [303]: leftzeros_count(a)
Out[303]: array([ 1,  1,  2,  3,  3,  6, 10])

Runtime test

For the timings, let's tile the given sample a large number of times and time the vectorized approaches -

In [7]: a = np.array([1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0])

In [8]: a = np.tile(a,100000)

# @Eric Duminil's soln
In [9]: %timeit (a == 0).cumsum()[a > 0]
100 loops, best of 3: 10.9 ms per loop

# Proposed in this post
In [10]: %timeit leftzeros_count(a)
100 loops, best of 3: 3.71 ms per loop

3 Comments

Thank you for your answer, I tried this array: [1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0] but the function returned: [ 2, 4, 4, 4, 8, 10], it should be: [0, 2, 2, 2, 6, 8]
@user2505650 Check out the edits please. I have updated with a new method.
It's better to use nonzero directly without flatnonzero calling ravel before nonzero.
2

In the non-vectorized manner:

>>> x = [1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0]
>>> c, y = 0, []
>>> for i in x:
...     if i == 1:
...         y.append(c)
...     else:
...         c += 1
... 
>>> y
[0, 0, 1, 2, 2, 5]

For vectorized solution, see @Divakar's answer:

In numpy, first find the non-zero indices, with np.nonzero():

>>> np.nonzero(x)[0]
array([ 0,  1,  3,  5,  6, 10])

Then subtract that with the range array of length of indices:

>>> idx = np.nonzero(x)[0]
>>> np.arange(len(idx))
array([0, 1, 2, 3, 4, 5])
>>> np.nonzero(x)[0] - np.arange(len(idx))
array([0, 0, 1, 2, 2, 5])

>>> np.arange(x.count(1))
array([0, 1, 2, 3, 4, 5])
>>> np.nonzero(x)[0] - np.arange(x.count(1))
array([0, 0, 1, 2, 2, 5])

5 Comments

Yes, not numpy though.
How is this different from my solution?
@Divakar, not much difference, just that flatnonzero has a superfluous ravel() step ;P
@alvas That's negligible difference in terms of memory or performance. What's the big difference?
No difference... Just that I thought it's nice to juxtapose the answers between vectorized and non-vectorized one... I'll add the edit comments... Still typing...
1

If the count is cumulative (as per your example) then you can do this easily in O(n). Simply have a counter that increases by one every time you find a zero and then append the value of the counter variable to another array for every one you hit in your initial array.

2 Comments

@EricDuminil Granted you have provided a concise solution, I am not in favor of providing straight up code solutions to simple problems like this one. It furthers copy&paste programming, which pollutes code bases all around the world. Just my 2 cents (and clearly opinion based).
Thanks for the comment. I understand your point of view. I really appreciate numpy concise syntax and couldn't resist writing something.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.