4

I have the following code which iterates through 2d numpy array named "m". It works extremely slow. How can I transform this code using numpy functions so that I avoid using the for loops?

pairs = []
for i in range(size):
    for j in range(size):
        if(i >= j):
            continue
        if(m[i][j] + m[j][i] >= 0.75):
            pairs.append([i, j, m[i][j] + m[j][i]])
2
  • What the dimensions of this array? Commented Jan 30, 2019 at 0:48
  • about 5000x5000 Commented Jan 30, 2019 at 0:49

2 Answers 2

6

You can use vectorised approach using NumPy. The idea is:

  • First initialize a matrix m and then create m+m.T which is equivalent to m[i][j] + m[j][i] where m.T is the matrix transpose and call it summ
  • np.triu(summ) returns the upper triangular part of the matrix (This is equivalent to ignoring the lower part by using continue in your code). This avoids explicit if(i >= j): in your code. Here you have to use k=1 to exclude the diagonal elements. By default, k=0 which includes the diagonal elements as well.
  • Then you get the indices of points using np.argwhere where the sum m+m.T is greater than equal to 0.75
  • Then you store those indices and the corresponding values in a list for later processing/printing purposes.

Verifiable example (using a small 3x3 random dataset)

import numpy as np

np.random.seed(0)
m = np.random.rand(3,3)
summ = m + m.T

index = np.argwhere(np.triu(summ, k=1)>=0.75)

pairs = [(x,y, summ[x,y]) for x,y in index]
print (pairs)
# # [(0, 1, 1.2600725493693163), (0, 2, 1.0403505873343364), (1, 2, 1.537667113848736)]

Further performance improvement

I just worked out an even faster approach to generate the final pairs list avoiding explicit for loops as

pairs = list(zip(index[:, 0], index[:, 1], summ[index[:,0], index[:,1]]))
Sign up to request clarification or add additional context in comments.

4 Comments

Suggest you add np.random.seed(0) up at the top and rerun to give repeatable results.
Decreased my program execution time from 55 sec to 1.5 sec. Thanks a lot!
@nota: I edited slightly to include k=1 because you don't want the diagonal elements
@nota: To get more speed up check my edit using zip
5

One way to optimize your code is to avoid comparison if (i >= j). To traverse only the lower triangle of the array without that comparison, you have to make the inner loop start with the value of i of the outermost loop. That way, you avoid size x size if comparisons.

import numpy as np
size = 5000
m = np.random.rand(size, size)
pairs = []


for i in range(size):
    for j in range(i , size):

        if(m[i][j] + m[j][i] >= 0.75):
            pairs.append([i, j, m[i][j] + m[j][i]])

1 Comment

It's better to define first size and then use it in m = np.random.rand(size, size)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.