7

How do I mask an array based on the actual index values?

That is, if I have a 10 x 10 x 30 matrix and I want to mask the array when the first and second index equal each other.

For example, [1, 1 , :] should be masked because 1 and 1 equal each other but [1, 2, :] should not because they do not.

I'm only asking this with the third dimension because it resembles my current problem and may complicate things. But my main question is, how to mask arrays based on the value of the indices?

2
  • 6
    Off topic, your avatar matches your question topic nicely :P Commented Sep 17, 2013 at 22:13
  • 3
    Yep, just trying to draw Z's all over my code. Thanks for helping me with that. Commented Sep 17, 2013 at 23:56

2 Answers 2

7

In general, to access the value of the indices, you can use np.meshgrid:

i, j, k = np.meshgrid(*map(np.arange, m.shape), indexing='ij')
m.mask = (i == j)

The advantage of this method is that it works for arbitrary boolean functions on i, j, and k. It is a bit slower than the use of the identity special case.

In [56]: %%timeit
   ....: i, j, k = np.meshgrid(*map(np.arange, m.shape), indexing='ij')
   ....: i == j
10000 loops, best of 3: 96.8 µs per loop

As @Jaime points out, meshgrid supports a sparse option, which doesn't do so much duplication, but requires a bit more care in some cases because they don't broadcast. It will save memory and speed things up a little. For example,

In [77]: x = np.arange(5)

In [78]: np.meshgrid(x, x)
Out[78]: 
[array([[0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4]]),
 array([[0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1],
       [2, 2, 2, 2, 2],
       [3, 3, 3, 3, 3],
       [4, 4, 4, 4, 4]])]

In [79]: np.meshgrid(x, x, sparse=True)
Out[79]: 
[array([[0, 1, 2, 3, 4]]),
 array([[0],
       [1],
       [2],
       [3],
       [4]])]

So, you can use the sparse version as he says, but you must force the broadcasting as such:

i, j, k = np.meshgrid(*map(np.arange, m.shape), indexing='ij', sparse=True)
m.mask = np.repeat(i==j, k.size, axis=2)

And the speedup:

In [84]: %%timeit
   ....: i, j, k = np.meshgrid(*map(np.arange, m.shape), indexing='ij', sparse=True)
   ....: np.repeat(i==j, k.size, axis=2)
10000 loops, best of 3: 73.9 µs per loop
Sign up to request clarification or add additional context in comments.

6 Comments

Why not use np.arange instead of range, should save time.
Nice use of map... But you are creating 3 full masks of (10, 10, 30)... If you set sparse=True in the call to np.meshgrid, you reduce the memory footprint substantially, but you need to include all dimensions in the creation of the mask, e.g. (i == j) & (k >= 0) runs twice as fast as your solution.
Thanks @Jaime, I was trying to figure out how to do it with the sparser versions but couldn't!
It's a pity that boolean mask indexing does not support broadcasting. There may be some good reason for it, but it would make some operations ridiculously easy.
sparse=True seems like a lot of trouble for 8us, at least in this case :).
|
0

In your special case of wanting to mask the diagonals, you can use the np.identity() function which returns ones along the diagonal. Since you have the third dimension, we have to add that third dimension to the the identity matrix:

m.mask = np.identity(10)[...,None]*np.ones((1,1,30))

There might be a better way of constructing that array, but it is basically stacking 30 of the np.identity(10) array. For example, this is equivalent:

np.dstack((np.identity(10),)*30)

but slower:

In [30]: timeit np.identity(10)[...,None]*np.ones((1,1,30))
10000 loops, best of 3: 40.7 µs per loop

In [31]: timeit np.dstack((np.identity(10),)*30)
1000 loops, best of 3: 219 µs per loop

And @Ophion's suggestions

In [33]: timeit np.tile(np.identity(10)[...,None], 30)
10000 loops, best of 3: 63.2 µs per loop

In [71]: timeit np.repeat(np.identity(10)[...,None], 30)
10000 loops, best of 3: 45.3 µs per loop

4 Comments

np.tile(np.identity(10)[...,None], 30)
Well thats interesting. tile is basically a wrapper on repeat. So if you run np.repeat(np.identity(10)[...,None], 30, axis=-1) you skip some extra if statements, but np.identity(10)[...,None]*np.ones((1,1,30)) is universally faster for any array size in 1 dimension. Good to know.
@Ophion Cool, I'd never considered that along dimensions of length 1, tile and repeat are equivalent.
You do have to specify an axis as np.repeat is odd, instead of the default axis=-1 as with most function it is axis=None. If you do not explicitly pass the axis keyword you get a 1D array out and will have to reshape it manually.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.