1

I have a matrix A with dimension of 1024 * 307200 and another matrix B of dimension 1024 * 50. I am performing L2_norm on these two matrices in a nested for loop to get my final matrix C as 307200 * 50.
You can find the code below:

    for i in range(307200):
        for l in range(50):
            C[i,l] =  numpy.linalg.norm(A[:,i] - B[:,l]))      

As you see the dimension of my variables is huge which is leading to a very high latency. I want to avoid this nested loop since for each values of i and l, I am using the same function.

Is there any way to optimize the above loop?

2
  • "The same function" is not a determining factor. Suppose you want to set all the elements to 1 instead: then you still have to loop over all elements. But perhaps you can make it a generator. Commented Mar 4, 2018 at 10:03
  • You could make i and l into numpy arrays containing integers within your given ranges, initialize an empty array of all results, and use numpy vectorization to fill in the values of the empty array via i and l. Commented Mar 4, 2018 at 10:08

2 Answers 2

3

Maybe you could replace the inner loop and your function with these matrix operations?

for i in range(307200):
    temp = A[:,i,np.newaxis] - B[:]
    C[i,:] = np.linalg.norm(temp, axis=0)

For smaller arrays, I got about 20 times lower running times. Perhaps you gain even more. In any case, plese make sure that you are receiving good results (on a smaller sets).

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for your answer. Actually I like your idea. I just now edited the question since I used numpy.linalg.norm in place of my previous function. Could you do the required changes in your answer for the modified question. +1 given already :)
Its working fine :) my previous code took 280 secs and now its 27 secs. Thanx
@usr2564301 Thanks for letting me know about this thumb rule in stackoverflow . Just now accepted the answer :)
2

Update: With OP's update and clarifications things get much simpler:

>>> def f_pp(A, B):
...     return np.sqrt(np.add.outer(np.einsum('ij,ij->j', A, A), np.einsum('il,il->l', B, B)) - 2*np.einsum('ij,il->jl', A, B))
... 

end Update

You could use np.einsum and real arithmetic for a massive speedup:

>>> def f_pp(A, B):
...     Ar = A.view(float).reshape(*A.shape, 2)
...     Br = B.view(float).reshape(*B.shape, 2)
...     return np.sqrt(np.add.outer(np.einsum('ijk,ijk->j', Ar, Ar), np.einsum('ilk,ilk->l', Br, Br)) - 2*np.einsum('ijk,ilk->jl', Ar, Br))
... 

For shapes (1024, 3072) and (1024, 50) I get a factor of about 40.

Some explanation:

Real arithmetic: Unless numpy does some unbelievably smart optimizations I would expect a complex product like x*x.conj() to use 4 real multiplications. Knowing that the result is real we save two of those.

Writing |A-B|^2 as |A|^2 + |B|^2 - 2|A*B|. This saves memory by avoiding the huge intermediate A-B of shape (1024, 3072, 50) ((1024, 307200, 50) in the full example) that a direct broadcasting approach would use.

3 Comments

Thanks for your answer. Actually I like your idea. I just now edited the question since I used numpy.linalg.norm in place of my previous function. Could you do the required changes in your answer for the modified question. +1 given already :)
@Sansk do your changes change your result? Also, are your arrays actually of type complex? Your first version suggested it.
Sorry for the confusion, actually they are not complex and I figured out I dont need to do conjugate for L2 norm, I just used nump.linalg to calculate Norm. However, the result didn't changed much. Anycase, thanks for your code, I got to know something new. If you want to modify your answer based on my changes, you are free to do it. :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.