4

V is (n,p) numpy array typically dimensions are n~10, p~20000

The code I have now looks like

A = np.zeros(p)
for i in xrange(n):
    for j in xrange(i+1):
        A += F[i,j] * V[i,:] * V[j,:]

How would I go about rewriting this to avoid the double python for loop?

3
  • 1
    What is the shape of F? Is it (n, n) or (n, n, p)? Commented Nov 20, 2013 at 23:06
  • did you consider writing those for-loops in C (since they are simple), and using scipy.weave.blitz ? Commented Nov 21, 2013 at 7:53
  • I would prefer to avoid cython or weave.blitz for the moment if some python code will be "good enough" Commented Nov 21, 2013 at 17:54

3 Answers 3

10

While Isaac's answer seems promising, as it removes those two nested for loops, you are having to create an intermediate array M which is n times the size of your original V array. Python for loops are not cheap, but memory access ain't free either:

n = 10
p = 20000
V = np.random.rand(n, p)
F = np.random.rand(n, n)

def op_code(V, F):
    n, p = V.shape
    A = np.zeros(p)
    for i in xrange(n):
        for j in xrange(i+1):
            A += F[i,j] * V[i,:] * V[j,:]
    return A

def isaac_code(V, F):
    n, p = V.shape
    F = F.copy()
    F[np.triu_indices(n, 1)] = 0
    M = (V.reshape(n, 1, p) * V.reshape(1, n, p)) * F.reshape(n, n, 1)
    return M.sum((0, 1))

If you now take both for a test ride:

In [20]: np.allclose(isaac_code(V, F), op_code(V, F))
Out[20]: True

In [21]: %timeit op_code(V, F)
100 loops, best of 3: 3.18 ms per loop

In [22]: %timeit isaac_code(V, F)
10 loops, best of 3: 24.3 ms per loop

So removing the for loops is costing you an 8x slowdown. Not a very good thing... At this point you may even want to consider whether a function taking about 3ms to evaluate requires any further optimization. IN case you do, there's a small improvement which can be had by using np.einsum:

def einsum_code(V, F):
    n, p = V.shape
    F = F.copy()
    F[np.triu_indices(n, 1)] = 0
    return np.einsum('ij,ik,jk->k', F, V, V)

And now:

In [23]: np.allclose(einsum_code(V, F), op_code(V, F))
Out[23]: True

In [24]: %timeit einsum_code(V, F)
100 loops, best of 3: 2.53 ms per loop

So that's roughly a 20% speed up that introduces code that may very well not be as readable as your for loops. I would say not worth it...

Sign up to request clarification or add additional context in comments.

3 Comments

Did not know about einsum; very neat!
Yep. One common heuristic for vectorizing code when it's a little troublesome to do the whole thing is to vectorize over the long axes and loop over the short ones, which gets you almost all the benefit for a fraction of the headache. But here, that's what the OP had already done!
Thanks, I ended up going with the einsum method after testing each with a range of inputs (n=5-30 & p=100-20000), I was not aware it was there.
7

The difficult part about this is that you only want to take the sum of the elements with j <= i. If not for that then you could do the following:

M = (V.reshape(n, 1, p) * V.reshape(1, n, p)) * F.reshape(n, n, 1)
A = M.sum(0).sum(0)

If F is symmetric (if F[i,j] == F[j,i]) then you can exploit the symmetry of M above as follows:

D = M[range(n), range(n)].sum(0)
A = (M.sum(0).sum(0) - D) / 2.0 + D

That said, this is really not a great candidate for vectorization, as you have n << p and so your for-loops are not going to have much effect on the speed of this computation.

Edit: As Bill said below, you can just make sure that the elements of F that you don't want to use are set to zero first, and then the M.sum(0).sum(0) result will be what you want.

3 Comments

What about F[np.triu_indices(n, 1)] = 0? That way the upper half of F (offset by 1) is zero and will not contribute to the sum.
Also, a.sum(0).sum(0) can be written as np.sum(a, (0,1)) or a.sum((0,1)) which isn't any faster but I find it more readable.
As I said, this code is really not a good candidate for vectorization.
1

The expression can be written as

formula

and thus you can sum it like this using the np.newaxis-construct:

na = np.newaxis
X = (np.tri(n)*F)[:,:,na]*V[:,na,:]*V[na,:,:]
X.sum(axis=1).sum(axis=0)

Here a 3D-array X[i,j,p] is constructed, and then the 2 first axes are summed, which results in a 1D array A[p]. Additionaly F was multiplied with a triangular matrix to restrict the summation according to the problem.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.