0

I want to do quick pair-wise matchings of all elements in a numpy array, ignoring self-matching.

For example given a numpy array like

a = np.array(range(4))

I want to get a matrix like

array([[0, 1], 
       [0, 2],
       [0, 3],
       [1, 0],
       [1, 2],
       [1, 3],
       [2, 0],
       [2, 1],
       [2, 3],
       [3, 0],
       [3, 1],
       [3, 2]])

In my real case, I have different sizes of arrays varying from 1 to 1000. I need to constantly call a function that does this pair-wise matching. I wrote the code below

import numpy as np
from itertools import permutations

a = np.array(range(1000))
xs, ys = zip(*[(x, y) for x, y in permutations(a, 2)])
res = np.vstack([np.array(xs), np.array(ys)]).T
print(res)

I want to know if there are faster ways (maybe more numpyic ways) to do this?

I would like to see some benchmarkings of runtime vs. array size.

1
  • 1
    Not sure if there is a numpic way but how about - np.array(list(permutations(a,2))) Commented Dec 17, 2020 at 3:53

2 Answers 2

2

Making a meshgrid and then filtering out the "matching" rows (e.g. 0,0 and 12,12) would likely be quite a bit faster at this scale.

import numpy as np
from itertools import permutations

%%time
n = 1000
a = np.array(np.arange(n))
xs, ys = zip(*[(x, y) for x, y in permutations(a, 2)])
res = np.vstack([np.array(xs), np.array(ys)]).T
res
CPU times: user 1.08 s, sys: 176 ms, total: 1.26 s
Wall time: 1.28 s
%%time
n = 1000
a = np.arange(n)
arr = np.array(np.meshgrid(a, a)).T.reshape(-1,2)
arr = arr[arr[:, 0] != arr[:, 1]]
CPU times: user 49.8 ms, sys: 33.2 ms, total: 83 ms
Wall time: 89.2 ms
%%time
n = 2000
a = np.array(np.arange(n))
xs, ys = zip(*[(x, y) for x, y in permutations(a, 2)])
res = np.vstack([np.array(xs), np.array(ys)]).T
res
CPU times: user 3.94 s, sys: 924 ms, total: 4.86 s
Wall time: 5.03 s
%%time
n = 2000
a = np.arange(n)
arr = np.array(np.meshgrid(a, a)).T.reshape(-1,2)
arr = arr[arr[:, 0] != arr[:, 1]]
CPU times: user 218 ms, sys: 105 ms, total: 323 ms
Wall time: 328 ms
np.testing.assert_array_equal(res, arr) # passes
Sign up to request clarification or add additional context in comments.

Comments

0

If you only want pairs like [0,1] and [1,0] :


a = np.array(range(4))  # [0, 1, 2, 3]
n = len(a)

i,j = np.indices((n,n))
mask = i != j
res1 = np.column_stack((i[mask],j[mask]))
print(res1)
'''
[[0 1]
 [0 2]
 [0 3]
 [1 0]
 [1 2]
 [1 3]
 [2 0]
 [2 1]
 [2 3]
 [3 0]
 [3 1]
 [3 2]]
'''

But If you want to avoid duplicate pairs. For example : if you only want unordered pairs

like [0,1] but not [1,0]) : Use np.triu


a,b = np.triu_indices(n, k =1) 
res2 = np.column_stack((a,b))
print(res2)
'''
[[0 1]
 [0 2]
 [0 3]
 [1 2]
 [1 3]
 [2 3]]
'''

 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.