How to vectorize multiple levels of recursion?

Question

I am a noobie to python and numpy (and programming in general). I am trying to speed up my code as much as possible. The math involves several summations over multiple axes of a few arrays. I've attained one level of vectorization, but I can't seem to get any deeper than that and have to resort to for loops (I believe there's three levels of recursion, M, N, and I, one of which I've eliminated, I). Here's my code for the relevant section (this code works, but I'd like to speed it up):

def B1(n, i):
    return np.pi * n * dmaxi * (-1)**(n+1) * np.sin(qi[i]*dmaxi) * ((np.pi*n)**2 - (qi[i]*dmaxi)**2)**(-1)

for n in N:
    B[n, :] = B1(n, I)

for m in M:
    for n in N:
        C[m, n] = np.dot((1/np.square(qi*Iq[0, :, 2]))*B[m, :], B[n, :])

    Y[m] = np.dot((1/np.square(qi*Iq[0, :, 2]))*U[0, :, 1], B[m, :])

A = np.linalg.solve(C[1:, 1:], (0.25)*Y[1:])

dmaxi is just a float and m, n and i are integers. The arrays have the following shapes:

>>> qi.shape
(551,)
>>> N.shape
(18,)
>>> M.shape
(18,)
>>> I.shape
(551,)
>>> Iq.shape
(1, 551, 3)
>>> U.shape
(1, 551, 3)

As you can see I've vectorized the calculation of the 2nd axis of B, but I can't seem to do it for the 1st axis, C, and Y, which still require the for loops. It seems that when I try to do the same form of vectorization that I did for the 1st axis of B (define a function, then give the array as the argument), I get a broadcasting error since it appears to be trying to calculate both axes simultaneously, rather than the 1st, then the 2nd, which is why I had to force it into a for loop instead. The same problem occurs for both C and Y which is why they're both in for loops also. In case that's confusing, essentially what I tried was:

>>> B[:, :] = B1(N, I)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "sasrec_v6.py", line 155, in B1
    return np.pi * n * dmaxi * (-1)**(n+1) * np.sin(qi[i]*dmaxi) * ((np.pi*n)**2 - (qi[i]*dmaxi)**2)**(-1)
ValueError: operands could not be broadcast together with shapes (18) (551)

Vectorizing the 2nd axis of B made a substantial improvement to the speed of my code, so I'm assuming that the same will apply for further vectorization (I hope I'm using that term correctly by the way).

The code you provided does not appear to use recursion at all. — aestrivex
– aestrivex, Commented Nov 26, 2013 at 15:15
I may be using the term incorrectly. I'm simply referring to the use of the nested for loops. — tomerg
– tomerg, Commented Nov 26, 2013 at 16:09
So, lowercase i is not an integer but a vector, the same as uppercase i? — user2379410
– user2379410, Commented Nov 26, 2013 at 16:23
@moarningsun I think i is the parameter name, I will be the argument in its place. — askewchan
– askewchan, Commented Nov 26, 2013 at 16:27
What are the expected shapes of B, C, and Y. I thought I had it figured out but I think I'm making a mistake. — askewchan
– askewchan, Commented Nov 26, 2013 at 16:40

askewchan · Accepted Answer · 2013-11-26 17:32:17Z

2

You can use broadcasting to make 2d arrays from your 1d index vectors. I haven't tested these yet, but they should work:

If you reshape the N to be a column vector, then B1 will return a 2d array:

B[N] = B1(N[:, None], I)

For Y and C, I'd use np.einsum to have better control over which axes are mulitplied (probably this could be done with np.dot as well but I'm not sure how.

C[M[:, None], N] = np.einsum('ij,kj->ik',
        B[M]/np.square(qi*Iq[0, :, 2]),
        B[N])

Y[M] = np.einsum('i, ki->k',
        U[0, :, 1]/np.square(qi*Iq[0, :, 2]),
        B[M])

To see what that indexing trick does:

In [1]: a = np.arange(3)

In [2]: a
Out[2]: array([0, 1, 2])

In [3]: a[:, None]
Out[3]: 
array([[0],
       [1],
       [2]])

In [4]: b = np.arange(4,1,-1)

In [5]: b
Out[5]: array([4, 3, 2])

In [6]: a[:, None] * b
Out[6]: 
array([[0, 0, 0],
       [4, 3, 2],
       [8, 6, 4]])

It saves two orders of magnitude in time:

In [92]: %%timeit
   ....: B = np.zeros((18, 551))
   ....: C = np.zeros((18, 18))
   ....: Y = np.zeros((18))
   ....: for n in N:
   ....:     B[n, :] = B1(n, I)
   ....: for m in M:
   ....:     for n in N:
   ....:         C[m, n] = np.dot((1/np.square(qi*Iq[0, :, 2]))*B[m, :], B[n, :])
   ....:     Y[m] = np.dot((1/np.square(qi*Iq[0, :, 2]))*U[0, :, 1], B[m, :])
   ....: 
100 loops, best of 3: 15.8 ms per loop

In [93]: %%timeit
   ....: Bv = np.zeros((18, 551))
   ....: Cv = np.zeros((18, 18))
   ....: Yv = np.zeros((18))
   ....: Bv[N] = B1(N[:, None], I)
   ....: Cv[M[:, None], N] = np.einsum('ij,kj->ik', B[M]/np.square(qi*Iq[0, :, 2]), B[N])
   ....: Yv[M] = np.einsum('i, ki->k', U[0, :, 1]/np.square(qi*Iq[0, :, 2]), B[M])
   ....: 
1000 loops, best of 3: 1.34 ms per loop

Here's my test:

import numpy as np

# make fake data:
np.random.seed(5)

qi = np.random.rand(551)
N = np.random.randint(0,18,18)#np.arange(18)
M = np.random.randint(0,18,18)#np.arange(18)
I = np.arange(551)
Iq = np.random.rand(1, 551, 3)
U = np.random.rand(1, 551, 3)

B = np.zeros((18, 551))
C = np.zeros((18, 18))
Y = np.zeros((18))
Bv = np.zeros((18, 551))
Cv = np.zeros((18, 18))
Yv = np.zeros((18))

dmaxi = 1.

def B1(n, i):
    return np.pi * n * dmaxi * (-1)**(n+1) * np.sin(qi[i]*dmaxi) * ((np.pi*n)**2 - (qi[i]*dmaxi)**2)**(-1)

for n in N:
    B[n, :] = B1(n, I)

for m in M:
    for n in N:
        C[m, n] = np.dot((1/np.square(qi*Iq[0, :, 2]))*B[m, :], B[n, :])
    Y[m] = np.dot((1/np.square(qi*Iq[0, :, 2]))*U[0, :, 1], B[m, :])

Bv[N] = B1(N[:, None], I)
print "B correct?", np.allclose(Bv, B)

# np.einsum test case:
n, m = 2, 3
a = np.arange(n*m).reshape(n,m)*8 + 2
b = np.arange(n*m)[::-1].reshape(n,m)
c = np.empty((n,n))
for i in range(n):
    for j in range(n):
        c[i,j] = np.dot(a[i],b[j])
cv = np.einsum('ij,kj->ik', a, b)
print "einsum test successful?", np.allclose(c,cv)

Cv[M[:, None], N] = np.einsum('ij,kj->ik',
        B[M]/np.square(qi*Iq[0, :, 2]),
        B[N])
print "C correct?", np.allclose(Cv, C)

Yv[M] = np.einsum('i, ki->k',
        U[0, :, 1]/np.square(qi*Iq[0, :, 2]),
        B[M])
print "Y correct?", np.allclose(Yv, Y)

output :D

B correct? True
einsum test successful? True
C correct? True
Y correct? True

edited Nov 26, 2013 at 17:32

answered Nov 26, 2013 at 16:12

askewchan

46.7k18 gold badges125 silver badges135 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

tomerg Over a year ago

wow. very impressive speed increase. I didn't even think of reshaping N to be a column vector for that. I tried to use the np.einsum previously but I wasn't able to get it to work so I thought maybe it wasn't applicable, but clearly I was wrong. Thanks so much!

askewchan Over a year ago

Please be sure to run your own code and mine, and make sure they work for your inputs. I made some assumptions and hopefully it works but I don't want you not to double check!

tomerg Over a year ago

Yes I had when I wrote my comment and accepted the answer. I should've mentioned that. It gives all the same answers as before, just faster.

Collectives™ on Stack Overflow

How to vectorize multiple levels of recursion?

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related