Removing leading zeros of a numpy array without using a for loop

Question

How can I remove only the leading zeroes from a numpy array without using for a loop?

import numpy as np

x = np.array([0,0,1,1,1,1,0,1,0,0])

# Desired output
array([1, 1, 1, 1, 0, 1, 0, 0])

I have written the following code

x[min(min(np.where(x>=1))):]

I was wondering if there is a more efficient solution.

np.trim_zeros(x, 'f') - internally this is a for loop, but in many cases, this probably will be the most efficient way. — Alex Riley
– Alex Riley, Commented May 4, 2018 at 15:44
What are typical values for the length of x and the number of leading zeros? — Warren Weckesser
– Warren Weckesser, Commented May 4, 2018 at 15:55

Aechlys · Accepted Answer · 2018-05-04 15:44:25Z

7

You can use np.trim_zeros(x, 'f').

The 'f' means to trim the zeros from the front. Option 'b' would trim the zeros from the back. Default option 'fb' trims them from both sides.

x = np.array([0,0,1,1,1,1,0,1,0,0])
# [0 0 1 1 1 1 0 1 0 0]
np.trim_zeros(x, 'f')
# [1 1 1 1 0 1 0 0]

answered May 4, 2018 at 15:44

Aechlys

1,3067 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

GeekInDisguise Over a year ago

As said in other answers or comments, the trim_zero() implementation seems to use for loops internally, so should we find alternative solutions? Or otherwise, if the OP was just looking for a library function to avoid to manually write loops, then this answer would be good enough, but the question should be made more clear.

fferri · Accepted Answer · 2018-05-04 16:06:00Z

4

Since np.trim_zeros uses a for loop, here's a truly vectorized solution:

x = x[np.where(x != 0)[0][0]:]

However I'm not sure at which point it starts to be more efficient than np.trim_zeros. It will be more efficient in the worst case (i.e. array with most of leading zeros).

Anyway, it can be a useful learning example.

Double-sided trim:

>>> idx = np.where(x != 0)[0]
>>> x = x[idx[0]:1+idx[-1]]

edited May 4, 2018 at 16:06

answered May 4, 2018 at 16:00

fferri

19k8 gold badges57 silver badges110 bronze badges

3 Comments

Aechlys Over a year ago

Your answer has intrigued me, so I've done a little research. Isn't slicing internally also treated as a for loop? Referring to: this post.

fferri Over a year ago

That article is about list. Are you sure numpy does the same? Slicing it's just a call to the __getitem__ method with a slice object, e.g. x[2:4] is the same as x.__getitem__(slice(2,4)). It depends on how numpy implements that method.

Aechlys Over a year ago

I am not sure, hence the question. I was curious about the for-less implementation and wanted to see it. I am, however, unable to retrace the steps in the post for numpy.ndarray object. I've also tried to snoop in the numpy sources, but to no avail. Anyway, I guess we should not be discussing it here.

Paul Panzer · Accepted Answer · 2018-05-04 19:54:34Z

Here is a numpy approach that short-circuits. It utilizes the fact that the representation of 0 for any (?) dtype are zero-bytes.

import numpy as np
import itertools

# check assumption that for example 0.0f is represented as 00 00 00 00
allowed_dtypes = set()
for dt in map(np.dtype, itertools.chain.from_iterable(np.sctypes.values())):
    try:
        if not np.any(np.zeros((1,), dtype=dt).view(bool)):
            allowed_dtypes.add(dt)
    except:
        pass

def trim_fast(a):
    assert a.dtype in allowed_dtypes
    cut = a.view(bool).argmax() // a.dtype.itemsize
    if a[cut] == 0:
        return a[:0]
    else:
        return a[cut:]

Comparison with other methods:

Code to generate plot:

def np_where(a):
    return a[np.where(a != 0)[0][0]:]

def np_trim_zeros(a):
    return np.trim_zeros(a, 'f')

import perfplot

tf, nt, nw = trim_fast, np_trim_zeros, np_where
def trim_fast(A): return [tf(a) for a in A]
def np_trim_zeros(A): return [nt(a) for a in A]
def np_where(A): return [nw(a) for a in A]

perfplot.save('tz.png',
    setup=lambda n: np.clip(np.random.uniform(-n, 1, (100, 20*n)), 0, None),
    n_range=[2**k for k in range(2, 11)],
    kernels=[
        trim_fast,
        np_where,
        np_trim_zeros
        ],
    logx=True,
    logy=True,
    xlabel='zeros per nonzero',
    equality_check=None
    )

Collectives™ on Stack Overflow

Removing leading zeros of a numpy array without using a for loop

3 Answers 3

1 Comment

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related