2

I am trying to get a fast vectorized version of the following loop:

for i in xrange(N1):
   A[y[i]] -= B[i,:]

Here A.shape = (N2,N3), y.shape = (N1) with y taking values in [0,N2[, B.shape = (N1,N3). You can think of entries of y being indices into rows of A. Here N1 is large, N2 is pretty small and N3 is smallish.

I thought simply doing

A[y] -= B

would work, but the issue is that there are repeated entries in y and this does not do the right thing (i.e., if y=[1,1] then A[1] is only added to once, not twice). Also this is does not seem to be any faster than the unvectorized for loop.

Is there a better way of doing this?

EDIT: YXD linked this answer to in comments which at first seems to fit the bill. It would seem you can do exactly what I want with

np.subtract.at(A, y, B)

and it does work, however when I try to run it it is significantly slower than the unvectorized version. So, the question remains: is there a more performant way of doing this?

EDIT2: An example, to make things concrete:

n1,n2,n3 = 10000, 10, 500
A = np.random.rand(n2,n3)
y = np.random.randint(n2, size=n1)
B = np.random.rand(n1,n3)

The for loop, when run using %timeit in ipython gives on my machine:

10 loops, best of 3: 19.4 ms per loop

The subtract.at version produces the same value for A in the end, but is much slower:

  1 loops, best of 3: 444 ms per loop
4
  • 1
    possible duplicate of numpy.array.__iadd__ and repeated indices and stackoverflow.com/q/15973827/553404 Commented Aug 21, 2015 at 14:25
  • 1
    @YXD, thanks for the suggestion, looked pretty good but it seems to make things slower in fact. I edited the question to address this. Commented Aug 21, 2015 at 14:36
  • Is A initialized to zeros, i.e. A = np.zeros((N2,N3)) before going into the loop? Commented Aug 21, 2015 at 14:45
  • @Divakar, it is not. Will add a concrete example to question. Commented Aug 21, 2015 at 14:51

1 Answer 1

3

The code for the original for-loop based approach would look something like this -

def for_loop(A):
    N1 = B.shape[0]
    for i in xrange(N1):
       A[y[i]] -= B[i,:]
    return A

Case #1

If n2 >> n3, I would suggest this vectorized approach -

def bincount_vectorized(A):

    n3 = A.shape[1]
    nrows = y.max()+1
    id = y[:,None] + nrows*np.arange(n3)
    A[:nrows] -= np.bincount(id.ravel(),B.ravel()).reshape(n3,nrows).T
    return A

Runtime tests -

In [203]: n1,n2,n3 = 10000, 500, 10
     ...: A = np.random.rand(n2,n3)
     ...: y = np.random.randint(n2, size=n1)
     ...: B = np.random.rand(n1,n3)
     ...: 
     ...: # Make copies
     ...: Acopy1 = A.copy()
     ...: Acopy2 = A.copy()
     ...: 

In [204]: %timeit for_loop(Acopy1)
10 loops, best of 3: 19 ms per loop

In [205]: %timeit bincount_vectorized(Acopy2)
1000 loops, best of 3: 779 µs per loop

Case #2

If n2 << n3, a modified for-loop approach with lesser loop complexity could be suggested -

def for_loop_v2(A):
    n2 = A.shape[0]
    for i in range(n2):
        A[i] -= np.einsum('ij->j',B[y==i]) # OR (B[y==i]).sum(0)
    return A

Runtime tests -

In [206]: n1,n2,n3 = 10000, 10, 500
     ...: A = np.random.rand(n2,n3)
     ...: y = np.random.randint(n2, size=n1)
     ...: B = np.random.rand(n1,n3)
     ...: 
     ...: # Make copies
     ...: Acopy1 = A.copy()
     ...: Acopy2 = A.copy()
     ...: 

In [207]: %timeit for_loop(Acopy1)
10 loops, best of 3: 24.2 ms per loop

In [208]: %timeit for_loop_v2(Acopy2)
10 loops, best of 3: 20.3 ms per loop
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for your answer and the detailed analysis. I am interested in case #2 for which your method is only marginally faster, so will hold off on accepting for now in case somebody has a better method.
@toth Sure, no problem!
@toth See if the approaches listed in this other answer to a very similar question helps?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.