Avoiding nested for loops in Python

Question

I have a matrix A with dimension of 1024 * 307200 and another matrix B of dimension 1024 * 50. I am performing L2_norm on these two matrices in a nested for loop to get my final matrix C as 307200 * 50.
You can find the code below:

    for i in range(307200):
        for l in range(50):
            C[i,l] =  numpy.linalg.norm(A[:,i] - B[:,l]))

As you see the dimension of my variables is huge which is leading to a very high latency. I want to avoid this nested loop since for each values of i and l, I am using the same function.

Is there any way to optimize the above loop?

"The same function" is not a determining factor. Suppose you want to set all the elements to 1 instead: then you still have to loop over all elements. But perhaps you can make it a generator. — Jongware
– Jongware, Commented Mar 4, 2018 at 10:03
You could make i and l into numpy arrays containing integers within your given ranges, initialize an empty array of all results, and use numpy vectorization to fill in the values of the empty array via i and l. — user7345804
– user7345804, Commented Mar 4, 2018 at 10:08

honza_p · Accepted Answer · 2018-03-04 13:48:25Z

3

Maybe you could replace the inner loop and your function with these matrix operations?

for i in range(307200):
    temp = A[:,i,np.newaxis] - B[:]
    C[i,:] = np.linalg.norm(temp, axis=0)

For smaller arrays, I got about 20 times lower running times. Perhaps you gain even more. In any case, plese make sure that you are receiving good results (on a smaller sets).

edited Mar 4, 2018 at 13:48

answered Mar 4, 2018 at 10:13

honza_p

2,1231 gold badge23 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Sansk Over a year ago

Thanks for your answer. Actually I like your idea. I just now edited the question since I used numpy.linalg.norm in place of my previous function. Could you do the required changes in your answer for the modified question. +1 given already :)

Sansk Over a year ago

Its working fine :) my previous code took 280 secs and now its 27 secs. Thanx

Sansk Over a year ago

@usr2564301 Thanks for letting me know about this thumb rule in stackoverflow . Just now accepted the answer :)

Paul Panzer · Accepted Answer · 2018-03-05 07:38:37Z

2

Update: With OP's update and clarifications things get much simpler:

>>> def f_pp(A, B):
...     return np.sqrt(np.add.outer(np.einsum('ij,ij->j', A, A), np.einsum('il,il->l', B, B)) - 2*np.einsum('ij,il->jl', A, B))
...

end Update

You could use np.einsum and real arithmetic for a massive speedup:

>>> def f_pp(A, B):
...     Ar = A.view(float).reshape(*A.shape, 2)
...     Br = B.view(float).reshape(*B.shape, 2)
...     return np.sqrt(np.add.outer(np.einsum('ijk,ijk->j', Ar, Ar), np.einsum('ilk,ilk->l', Br, Br)) - 2*np.einsum('ijk,ilk->jl', Ar, Br))
...

For shapes (1024, 3072) and (1024, 50) I get a factor of about 40.

Some explanation:

Real arithmetic: Unless numpy does some unbelievably smart optimizations I would expect a complex product like x*x.conj() to use 4 real multiplications. Knowing that the result is real we save two of those.

Writing |A-B|^2 as |A|^2 + |B|^2 - 2|A*B|. This saves memory by avoiding the huge intermediate A-B of shape (1024, 3072, 50) ((1024, 307200, 50) in the full example) that a direct broadcasting approach would use.

edited Mar 5, 2018 at 7:38

answered Mar 4, 2018 at 11:52

Paul Panzer

53.3k3 gold badges59 silver badges103 bronze badges

3 Comments

Sansk Over a year ago

Thanks for your answer. Actually I like your idea. I just now edited the question since I used numpy.linalg.norm in place of my previous function. Could you do the required changes in your answer for the modified question. +1 given already :)

Paul Panzer Over a year ago

@Sansk do your changes change your result? Also, are your arrays actually of type complex? Your first version suggested it.

Sansk Over a year ago

Sorry for the confusion, actually they are not complex and I figured out I dont need to do conjugate for L2 norm, I just used nump.linalg to calculate Norm. However, the result didn't changed much. Anycase, thanks for your code, I got to know something new. If you want to modify your answer based on my changes, you are free to do it. :)

Collectives™ on Stack Overflow

Avoiding nested for loops in Python

2 Answers 2

3 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related