Iterate through numpy array testing multiple elements efficiently

Question

I have the following code which iterates through 2d numpy array named "m". It works extremely slow. How can I transform this code using numpy functions so that I avoid using the for loops?

pairs = []
for i in range(size):
    for j in range(size):
        if(i >= j):
            continue
        if(m[i][j] + m[j][i] >= 0.75):
            pairs.append([i, j, m[i][j] + m[j][i]])

What the dimensions of this array?

Eduardo Soares
– Eduardo Soares

2019-01-30 00:48:20 +00:00
Commented Jan 30, 2019 at 0:48 — Eduardo Soares
– Eduardo Soares, Commented Jan 30, 2019 at 0:48
about 5000x5000

nota
– nota

2019-01-30 00:49:01 +00:00
Commented Jan 30, 2019 at 0:49 — nota
– nota, Commented Jan 30, 2019 at 0:49

Sheldore · Accepted Answer · 2019-01-30 02:10:34Z

6

You can use vectorised approach using NumPy. The idea is:

First initialize a matrix m and then create m+m.T which is equivalent to m[i][j] + m[j][i] where m.T is the matrix transpose and call it summ
np.triu(summ) returns the upper triangular part of the matrix (This is equivalent to ignoring the lower part by using continue in your code). This avoids explicit if(i >= j): in your code. Here you have to use k=1 to exclude the diagonal elements. By default, k=0 which includes the diagonal elements as well.
Then you get the indices of points using np.argwhere where the sum m+m.T is greater than equal to 0.75
Then you store those indices and the corresponding values in a list for later processing/printing purposes.

Verifiable example (using a small 3x3 random dataset)

import numpy as np

np.random.seed(0)
m = np.random.rand(3,3)
summ = m + m.T

index = np.argwhere(np.triu(summ, k=1)>=0.75)

pairs = [(x,y, summ[x,y]) for x,y in index]
print (pairs)
# # [(0, 1, 1.2600725493693163), (0, 2, 1.0403505873343364), (1, 2, 1.537667113848736)]

Further performance improvement

I just worked out an even faster approach to generate the final pairs list avoiding explicit for loops as

pairs = list(zip(index[:, 0], index[:, 1], summ[index[:,0], index[:,1]]))

edited Jan 30, 2019 at 2:10

answered Jan 30, 2019 at 1:05

Sheldore

39.2k9 gold badges63 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

8one6 Over a year ago

Suggest you add np.random.seed(0) up at the top and rerun to give repeatable results.

nota Over a year ago

Decreased my program execution time from 55 sec to 1.5 sec. Thanks a lot!

Sheldore Over a year ago

@nota: I edited slightly to include k=1 because you don't want the diagonal elements

Sheldore Over a year ago

@nota: To get more speed up check my edit using zip

Eduardo Soares · Accepted Answer · 2019-01-30 01:16:36Z

5

One way to optimize your code is to avoid comparison if (i >= j). To traverse only the lower triangle of the array without that comparison, you have to make the inner loop start with the value of i of the outermost loop. That way, you avoid size x size if comparisons.

import numpy as np
size = 5000
m = np.random.rand(size, size)
pairs = []


for i in range(size):
    for j in range(i , size):

        if(m[i][j] + m[j][i] >= 0.75):
            pairs.append([i, j, m[i][j] + m[j][i]])

edited Jan 30, 2019 at 1:16

answered Jan 30, 2019 at 1:04

Eduardo Soares

1,0004 silver badges14 bronze badges

1 Comment

Sheldore Over a year ago

It's better to define first size and then use it in m = np.random.rand(size, size)

Collectives™ on Stack Overflow

Iterate through numpy array testing multiple elements efficiently

2 Answers 2

4 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related