20

Assume we have a numpy.ndarray data, let say with the shape (100,200), and you also have a list of indices which you want to exclude from the data. How would you do that? Something like this:

a = numpy.random.rand(100,200)
indices = numpy.random.randint(100,size=20)
b = a[-indices,:] # imaginary code, what to replace here?

Thanks.

4 Answers 4

18

You can use b = numpy.delete(a, indices, axis=0)

Source: NumPy docs.

Sign up to request clarification or add additional context in comments.

3 Comments

For a numeric list of indices, np.delete uses the mask solution that you earlier rejected as taking up too much memory.
@hpaulj the documentation for delete says: "out : ndarray A copy of arr with the elements specified by obj removed." Do you mean that it uses a numpy.ma masked array? It does not sound like it to me.
No, not masked array; mask as in boolean index.
6

You could try:

a = numpy.random.rand(100,200)
indices = numpy.random.randint(100,size=20)
b = a[np.setdiff1d(np.arange(100),indices),:]

This avoids creating the mask array of same size as your data in https://stackoverflow.com/a/21022753/865169. Note that this example creates a 2D array b instead of the flattened array in the latter answer.

A crude investigation of runtime vs memory cost of this approach vs https://stackoverflow.com/a/30273446/865169 seems to suggest that delete is faster while indexing with setdiff1d is much easier on memory consumption:

In [75]: %timeit b = np.delete(a, indices, axis=0)
The slowest run took 7.47 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 24.7 µs per loop

In [76]: %timeit c = a[np.setdiff1d(np.arange(100),indices),:]
10000 loops, best of 3: 48.4 µs per loop

In [77]: %memit b = np.delete(a, indices, axis=0)
peak memory: 52.27 MiB, increment: 0.85 MiB

In [78]: %memit c = a[np.setdiff1d(np.arange(100),indices),:]
peak memory: 52.39 MiB, increment: 0.12 MiB

Comments

3

It's ugly but works:

b = np.array([a[i] for i in range(m.shape[0]) if i not in indices])

Comments

1

You could try something like this:

a = numpy.random.rand(100,200)
indices = numpy.random.randint(100,size=20)
mask = numpy.ones(a.shape, dtype=bool)
mask[indices,:] = False
b = a[mask]

2 Comments

This solution needs an array of the exact same size as my original data, which in my case is gigantic. The time and space complexity of this solution is O(n^2), which is not really practical for my data.
This is essentially method the np.delete uses. Look where it constructs keep = ones(N, dtype=bool); keep[obj,] = False.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.