How to count number of zeros at the left of each one in an Numpy array

Question

I have a numpy binary array like this:

   Array A = [1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0]

I would like to count how many 0s are there at the left of each 1, and return it in an other array that would look like this for this for this example:

nb_1s = [0, 0, 1, 2, 2, 5]

There are no 0s at the left for the two first 1s so the the first two numbers of the array are 0 etc...

I know that first I have to initiate an array with number of 1s in my array:

def give_zeros(binary_array):
    binary_array = np.asarray(binary_array)
    nb_zeros = np.zeros(binary_array.sum())


    return nb_zeros

But I'm not sure on how to count the number of zeros. Should I iterate in a for loop with 'nditer'? It doesn't seem efficient as i will have to run this function on very large arrays.

Do you have any ideas? Thank you.

Yes the count is cumulative, the last 1 has 3 zeros at its left adjacent plus the 2 zeros after the 1s — user2505650
– user2505650, Commented Oct 19, 2017 at 9:05

Eric Duminil · Accepted Answer · 2017-10-19 09:36:50Z

4

Code

You could use:

(A == 0).cumsum()[A > 0]
# array([0, 0, 1, 2, 2, 5])

or:

(~A).cumsum()[A]
# array([0, 0, 1, 2, 2, 5])

if A is a bool array.

Explanation

A == 0 is a boolean array which is True for each 0:

>>> import numpy as np
>>> A = np.array([1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0])
>>> A == 0
array([False, False,  True, False,  True, False, False,  True,  True,
        True, False,  True,  True,  True,  True], dtype=bool)

You can use cumsum() to count the number of Trues:

>>> (A == 0).cumsum()
array([0, 0, 1, 1, 2, 2, 2, 3, 4, 5, 5, 6, 7, 8, 9])

You only need the values where A > 0:

>>> (A == 0).cumsum()[A > 0]
array([0, 0, 1, 2, 2, 5])

Done!

edited Oct 19, 2017 at 9:36

answered Oct 19, 2017 at 9:25

Eric Duminil

54.6k10 gold badges80 silver badges134 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Daniel F Over a year ago

This seems like the most pythonic, if not the most efficient solution

Eric Duminil Over a year ago

@DanielF: Thanks. Indeed, the accepted answer is 20% faster than my code.

Daniel F Over a year ago

As usual, there's the wrong way, the right way, and the @Divakar way - which is sort of like the right way but with more code but somehow faster.

Divakar Over a year ago

@EricDuminil Added timings on a bigger dataset in my post. Using boolean array conversion to feed into flatnonzero for much faster performance. The speedup over yours seems to be around 3x now.

Divakar · Accepted Answer · 2017-10-20 05:01:54Z

3

Here's a vectorized way with differentiation of range array from the indices of 1s -

def leftzeros_count(a):
    idx = np.flatnonzero(a!=0)
    return idx - np.arange(len(idx))

Sample runs -

In [298]: a = np.array([1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0])

In [299]: leftzeros_count(a)
Out[299]: array([0, 0, 1, 2, 2, 5])

In [300]: a = np.array([0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0])

In [301]: leftzeros_count(a)
Out[301]: array([1, 1, 2, 3, 3, 6])

In [302]: a = np.array([0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1])

In [303]: leftzeros_count(a)
Out[303]: array([ 1,  1,  2,  3,  3,  6, 10])

Runtime test

For the timings, let's tile the given sample a large number of times and time the vectorized approaches -

In [7]: a = np.array([1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0])

In [8]: a = np.tile(a,100000)

# @Eric Duminil's soln
In [9]: %timeit (a == 0).cumsum()[a > 0]
100 loops, best of 3: 10.9 ms per loop

# Proposed in this post
In [10]: %timeit leftzeros_count(a)
100 loops, best of 3: 3.71 ms per loop

edited Oct 20, 2017 at 5:01

answered Oct 19, 2017 at 9:03

Divakar

222k19 gold badges273 silver badges374 bronze badges

3 Comments

user2505650 Over a year ago

Thank you for your answer, I tried this array: [1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0] but the function returned: [ 2, 4, 4, 4, 8, 10], it should be: [0, 2, 2, 2, 6, 8]

Divakar Over a year ago

@user2505650 Check out the edits please. I have updated with a new method.

alvas Over a year ago

It's better to use nonzero directly without flatnonzero calling ravel before nonzero.

alvas · Accepted Answer · 2017-10-19 09:24:08Z

2

In the non-vectorized manner:

>>> x = [1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0]
>>> c, y = 0, []
>>> for i in x:
...     if i == 1:
...         y.append(c)
...     else:
...         c += 1
... 
>>> y
[0, 0, 1, 2, 2, 5]

For vectorized solution, see @Divakar's answer:

In numpy, first find the non-zero indices, with np.nonzero():

>>> np.nonzero(x)[0]
array([ 0,  1,  3,  5,  6, 10])

Then subtract that with the range array of length of indices:

>>> idx = np.nonzero(x)[0]
>>> np.arange(len(idx))
array([0, 1, 2, 3, 4, 5])
>>> np.nonzero(x)[0] - np.arange(len(idx))
array([0, 0, 1, 2, 2, 5])

>>> np.arange(x.count(1))
array([0, 1, 2, 3, 4, 5])
>>> np.nonzero(x)[0] - np.arange(x.count(1))
array([0, 0, 1, 2, 2, 5])

edited Oct 19, 2017 at 9:24

answered Oct 19, 2017 at 9:03

alvas

123k118 gold badges503 silver badges807 bronze badges

5 Comments

cs95 Over a year ago

Yes, not numpy though.

Divakar Over a year ago

How is this different from my solution?

alvas Over a year ago

@Divakar, not much difference, just that flatnonzero has a superfluous ravel() step ;P

Divakar Over a year ago

@alvas That's negligible difference in terms of memory or performance. What's the big difference?

alvas Over a year ago

No difference... Just that I thought it's nice to juxtapose the answers between vectorized and non-vectorized one... I'll add the edit comments... Still typing...

Sebastian Fleck · Accepted Answer · 2017-10-19 10:24:57Z

1

If the count is cumulative (as per your example) then you can do this easily in O(n). Simply have a counter that increases by one every time you find a zero and then append the value of the counter variable to another array for every one you hit in your initial array.

edited Oct 19, 2017 at 10:24

answered Oct 19, 2017 at 9:05

Sebastian Fleck

1011 silver badge4 bronze badges

2 Comments

Sebastian Fleck Over a year ago

@EricDuminil Granted you have provided a concise solution, I am not in favor of providing straight up code solutions to simple problems like this one. It furthers copy&paste programming, which pollutes code bases all around the world. Just my 2 cents (and clearly opinion based).

Eric Duminil Over a year ago

Thanks for the comment. I understand your point of view. I really appreciate numpy concise syntax and couldn't resist writing something.

Collectives™ on Stack Overflow

How to count number of zeros at the left of each one in an Numpy array

4 Answers 4

Code

Explanation

4 Comments

3 Comments

5 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Code

Explanation

4 Comments

3 Comments

5 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related