Compare number in numpy array

Question

Suppose I have an array that stores a total of 800,000 integers of type 'int8'. I want to compare every pair of numbers in the array, and the result should be 1 if the numbers in those two positions are equal, and 0 if the numbers in those two positions are not equal. The results should be stored in an array of size (800000,800000) of type 'int8'. What are some efficient ways to compute this while optimizing speed and minimizing memory usage?

Ex.

array = [52, 41, 62 , 52]
result
[1,0,0,1
 0,1,0,0
 0,0,1,0
 1,0,0,1]

classical numpy broadcasting array = np.array([52, 41, 62 , 52]) ; (array[:, None] == array).astype(int) — mozway
– mozway, Commented Jun 29, 2023 at 18:20
The results should be stored in an array of size (800000,800000) of type 'int8' This will take 640 GB, assuming one byte per array element. — Nick ODell
– Nick ODell, Commented Jun 29, 2023 at 18:22
As noted by other commenters, this is a) rather straightforward, and b) rather impractical. If you can share bigger context for this comparison, there might be ways to do it more efficiently — Marat
– Marat, Commented Jun 29, 2023 at 18:27
In graph theory, there are two ways to represent the edges of a graph: an adjacency matrix or adjacency lists. What you are proposing is an adjacency matrix. That's never going to work because (as Nick already pointed out), there's no way that's going to fit into memory. So you need to use adjacency lists, which will only need 10s of megabytes, rather than 100s of gigabytes. — user3386109
– user3386109, Commented Jun 29, 2023 at 20:06

Dave · Accepted Answer · 2023-06-29 19:58:56Z

Built a map: val -> indices, where each value points to an array of the indices that hold that value. Parse your array and fill your map in linear time.

Now, you have everything you need to easily print your array, but you're storage is only linear (through printing or storing as an n^2 matrix is O(n^2), which can't be made more efficient).

E.g.

array = [52, 41, 62 , 52]
map = {52 => [0, 3],
       41 => [1],
       62 => [2]}

To print, go through the arrays of indices of your map. A given index array will generate as many rows as there are values in the array, with a 1 for each value in the array and a 0 for each other value. The output goes in the rows corresponding to the indices in the array.

E.g.

52 => [0, 3] gives us the row [1, 0, 0, 1], which will be the 0th and 3rd row.
41 => [1] gives us the row [0, 1, 0, 0], which will be the 1st row
62 => [2] gives us the row [0, 0, 1, 0], which will be the 2nd row.

astroChance · Accepted Answer · 2023-06-29 19:37:41Z

0

Agreed with the comments that depending on what you are trying to accomplish you may want to rethink the problem, but here is a solution to the specific question you asked, and borrowing from this answer:

import numpy as np

tst_arr = np.array([1,2,3,1]).astype('int8')

## create array of all permutations
perm_arr = np.empty((tst_arr.size, tst_arr.size, 2), dtype=tst_arr.dtype)
perm_arr[..., 0] = tst_arr[:, None]
perm_arr[..., 1] = tst_arr

## create binary array that checks if values are equal
check_arr = np.zeros((tst_arr.size, tst_arr.size)).astype('int8')
check_arr = np.where(perm_arr[:,:,0]==perm_arr[:,:,1], 1, 0)

print(check_arr)
[[1 0 0 1]
 [0 1 0 0]
 [0 0 1 0]
 [1 0 0 1]]

edited Jun 29, 2023 at 19:37

answered Jun 29, 2023 at 18:34

astroChance

3933 silver badges12 bronze badges

Collectives™ on Stack Overflow

Compare number in numpy array

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related