0

Suppose I have an array that stores a total of 800,000 integers of type 'int8'. I want to compare every pair of numbers in the array, and the result should be 1 if the numbers in those two positions are equal, and 0 if the numbers in those two positions are not equal. The results should be stored in an array of size (800000,800000) of type 'int8'. What are some efficient ways to compute this while optimizing speed and minimizing memory usage?

Ex.

array = [52, 41, 62 , 52]
result
[1,0,0,1
 0,1,0,0
 0,0,1,0
 1,0,0,1]
4
  • 3
    classical numpy broadcasting array = np.array([52, 41, 62 , 52]) ; (array[:, None] == array).astype(int) Commented Jun 29, 2023 at 18:20
  • 3
    The results should be stored in an array of size (800000,800000) of type 'int8' This will take 640 GB, assuming one byte per array element. Commented Jun 29, 2023 at 18:22
  • 2
    As noted by other commenters, this is a) rather straightforward, and b) rather impractical. If you can share bigger context for this comparison, there might be ways to do it more efficiently Commented Jun 29, 2023 at 18:27
  • 1
    In graph theory, there are two ways to represent the edges of a graph: an adjacency matrix or adjacency lists. What you are proposing is an adjacency matrix. That's never going to work because (as Nick already pointed out), there's no way that's going to fit into memory. So you need to use adjacency lists, which will only need 10s of megabytes, rather than 100s of gigabytes. Commented Jun 29, 2023 at 20:06

2 Answers 2

1

Built a map: val -> indices, where each value points to an array of the indices that hold that value. Parse your array and fill your map in linear time.

Now, you have everything you need to easily print your array, but you're storage is only linear (through printing or storing as an n^2 matrix is O(n^2), which can't be made more efficient).

E.g.

array = [52, 41, 62 , 52]
map = {52 => [0, 3],
       41 => [1],
       62 => [2]}

To print, go through the arrays of indices of your map. A given index array will generate as many rows as there are values in the array, with a 1 for each value in the array and a 0 for each other value. The output goes in the rows corresponding to the indices in the array.

E.g.

52 => [0, 3] gives us the row [1, 0, 0, 1], which will be the 0th and 3rd row.
41 => [1] gives us the row [0, 1, 0, 0], which will be the 1st row
62 => [2] gives us the row [0, 0, 1, 0], which will be the 2nd row.
Sign up to request clarification or add additional context in comments.

Comments

0

Agreed with the comments that depending on what you are trying to accomplish you may want to rethink the problem, but here is a solution to the specific question you asked, and borrowing from this answer:

import numpy as np

tst_arr = np.array([1,2,3,1]).astype('int8')

## create array of all permutations
perm_arr = np.empty((tst_arr.size, tst_arr.size, 2), dtype=tst_arr.dtype)
perm_arr[..., 0] = tst_arr[:, None]
perm_arr[..., 1] = tst_arr

## create binary array that checks if values are equal
check_arr = np.zeros((tst_arr.size, tst_arr.size)).astype('int8')
check_arr = np.where(perm_arr[:,:,0]==perm_arr[:,:,1], 1, 0)

print(check_arr)
[[1 0 0 1]
 [0 1 0 0]
 [0 0 1 0]
 [1 0 0 1]]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.