2d array as index of a 3d array

Question

I had a 2D array (C) with 8000x64 elements, an 1D array (s) with 8000x1 elements and another 1D array (d) with 1x64 elements. Every row of index i, where s[i] is True, shall be added by vector d. This works quite well:

C[s == True] += d

Now I have added one dimension to C, s, and d and the logic above shall be applied to every element of the additional dimension.

The following code does what I want, but it's very slow.

for i in range(I):
        C_this = C[:,:,i]
        s_this = s[:,i]
        d_this = d[:,i]

        C_this[s_this == True] += d_this
        C[:,:,i] = C_this

Is there a numpy way to do this without a for loop?

Is s already a boolean? If so, use it as a direct index. == True is not necessary. — Mad Physicist
– Mad Physicist, Commented Feb 17, 2020 at 16:27

hpaulj · Accepted Answer · 2020-02-17 23:18:33Z

2

It's easier with the extra dimension at the beginning:

In [376]: C = np.zeros((4,2,3),int)                                                            
In [377]: s = np.array([[0,0],[0,1],[1,0],[1,1]],bool)                                         
In [378]: d = np.arange(1,13).reshape(4,3)                                                     
In [379]: C.shape, s.shape, d.shape                                                            
Out[379]: ((4, 2, 3), (4, 2), (4, 3))
In [380]: I,J = np.nonzero(s)                                                                  
In [381]: I,J                                                                                  
Out[381]: (array([1, 2, 3, 3]), array([1, 0, 0, 1]))

In [383]: C[I,J]=d[I]                                                                          
In [384]: C                                                                                    
Out[384]: 
array([[[ 0,  0,  0],
        [ 0,  0,  0]],

       [[ 0,  0,  0],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [ 0,  0,  0]],

       [[10, 11, 12],
        [10, 11, 12]]])

Your way:

In [385]: C = np.zeros((4,2,3),int)                                                            
In [386]: for i in range(4): 
     ...:     C[i,:,:][s[i,:]] += d[i,:] 
     ...:                                                                                      
In [387]: C                                                                                    
Out[387]: 
array([[[ 0,  0,  0],
        [ 0,  0,  0]],

       [[ 0,  0,  0],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [ 0,  0,  0]],

       [[10, 11, 12],
        [10, 11, 12]]])

edited Feb 17, 2020 at 23:18

answered Feb 17, 2020 at 18:39

hpaulj

233k14 gold badges260 silver badges392 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Mad Physicist Over a year ago

You can avoid the loop if you use repeat. Not sure that's an improvement (pretty sure it isn't), but hey, fully vectorized.

bburks832 Over a year ago

+1 from me. I think you threw off @MadPhysicist with the "Your way" section, the first section works on it's own. I made few tweaks myself, and ran a profiler on it. I will create a separate answer with my code, but I would say this one should be marked as "Correct".

Mad Physicist Over a year ago

You're right that I didn't read carefully. This is infinitely more elegant than the crap I wrote. @bburks832. Thanks for bringing this back to my attention and making me read.

Mad Physicist · Accepted Answer · 2020-02-17 22:25:21Z

Due to how numpy indexing works, s selects the relevant rows of C in the first example. To do the same thing in the 3D case, you would have to reshape C into something that is (8000*3, 64) and s into (8000*3, 1). The only problem now is getting d to account for the different number of rows in each third dimension, which can be done with np.repeat.

The first part is

C2 = np.swapaxes(C, -1, 1).reshape(-1, 64)

This is extremely inefficient because it copies your entire array. A better arrangement would be if C had shape (3, 8000, 64) to begin with. Then you would only need to ravel the first two axes to get the proper shape and memory layout, without copying data.

repeats = np.count_nonzero(s, axis=0)
C.reshape(-1, 64)[s.ravel()] += np.repeat(d, repeats, axis=0)

Since the reshape operation returns a view in this case, the indexing should work properly to increment in-place. I don't think this approach is necessarily very good though, since it copies each row of d as many times as s is non-zero in the corresponding element of the new dimension.

bburks832 · Accepted Answer · 2020-02-22 19:16:28Z

Here is my implementation of the method @hpaulj proposed. Note that I don't want to take the credit from him, so please mark his answer, not mine, as correct. Just wanted to share what I did.

import numpy as np
import numpy.random as npr

C = np.zeros((100, 8000, 64), dtype=int)
s = np.zeros((100, 8000), dtype=bool)
d = np.zeros((100, 64), dtype=int)

C[:,:,:] = npr.randint(50, size=C.shape)
s[:,:] = npr.randint(3, size=s.shape)
d[:,:] = npr.randint(10, size=d.shape)

I, J = np.nonzero(s)
C[I, J] += d[I]

I then profiled the program I made, and it runs on my machine in less than 450 milliseconds (the last two lines take less than 300 ms). Note that the calls to "randint" were just to set up the array values, so those lines wouldn't apply to your use case.

Collectives™ on Stack Overflow

2d array as index of a 3d array

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related