2

I have a 2D aray of size(640X480) like this :

[[1.2 , 9.5 , 4.8 , 1.7],
 [5.5 , 8.1 , 7.6 , 7.1],
 [1.4 , 6.9 , 7.8 , 2.2]]     (this is a sample of a 4X3 array)

I have to find the top 100(or N) highest values in the array, in the FASTEST way possible; so I need the most optimised code which takes least processing time.

Since it is a giant array, it is fine if i only checked every 2nd element or every 3rd or 4th element.

The output of the algorithm should be a list of tuple, each tuple being the 2D index of the high-value element.

For example the index for 9.5 would be (0,1)

I have found a solution but it is too slow:

indexes=[]
for i in range(100):
    highest=-1
    highindex=0.1
    for indi,i in enumerate(array):
        for indj,j in enumerate(i):
            if j>highest and not((indi,indj) in indexes):
                highest= j
                highindex=(indi,indj)
    indexes.append(highindex)
4
  • I have found a way to do this but it takes too much time, what is the optimum way to do it? Commented Jul 1, 2018 at 9:30
  • Please share the too much time taking code. Commented Jul 1, 2018 at 10:06
  • Okay I have shared the time taking algorithm. Commented Jul 1, 2018 at 10:47
  • Possible duplicate of Get the indices of N highest values in an ndarray Commented Jul 1, 2018 at 10:56

2 Answers 2

6

With

numpy.argpartition, numpy.unravel_index and numpy.column_stack routines:

Test ndarray arr is a shuffled array with values 0 to 99 of shape (11, 9).
Let's say we want to find the list of 2d indices of top 7 largest values:

In [1018]: arr
Out[1018]: 
array([[36, 37, 38, 39, 40, 41, 42, 43, 44],
       [27, 28, 29, 30, 31, 32, 33, 34, 35],
       [72, 73, 74, 75, 76, 77, 78, 79, 80],
       [ 0,  1,  2,  3,  4,  5,  6,  7,  8],
       [18, 19, 20, 21, 22, 23, 24, 25, 26],
       [45, 46, 47, 48, 49, 50, 51, 52, 53],
       [ 9, 10, 11, 12, 13, 14, 15, 16, 17],
       [90, 91, 92, 93, 94, 95, 96, 97, 98],
       [54, 55, 56, 57, 58, 59, 60, 61, 62],
       [63, 64, 65, 66, 67, 68, 69, 70, 71],
       [81, 82, 83, 84, 85, 86, 87, 88, 89]])

In [1019]: top_N = 7

In [1020]: idx = np.argpartition(arr, arr.size - top_N, axis=None)[-top_N:]

In [1021]: result = np.column_stack(np.unravel_index(idx, arr.shape))

In [1022]: result
Out[1022]: 
array([[7, 2],
       [7, 3],
       [7, 4],
       [7, 5],
       [7, 7],
       [7, 8],
       [7, 6]])
Sign up to request clarification or add additional context in comments.

10 Comments

I get an error: ValueError: kth(=-98) out of bounds (1)
What do you get for arr.size - top_N?
@VikhyatAgarwal, what's your current code? Note, that when doing such partitioning top_N should not be larger than the number of columns of the initial array
what's your current arr.shape and arr.ndim?
@VikhyatAgarwal, that is the answer: you are testing an uninitialized or an empty array. Check your array initialization/filling code
|
1

This is the solution I thought of, hopefully it's fast enough for your needs.

num_list = [
    [1.2, 9.5, 4.8, 1.7],
    [5.5, 8.1, 7.6, 7.1],
    [5.5, 9.6, 7.6, 7.1],
    [5.5, 8.1, 4.5, 7.1],
    [1.4, 6.9, 7.8, 12.2]
]

needed_highest = 5 # This is where your 100 would go
highest = [-1] * needed_highest
result = [-1] * needed_highest

for y in range(0, len(num_list)):
    for x in range(0, len(num_list[y])):
        num = num_list[y][x]
        min_index = highest.index(min(highest))
        min_value = highest[min_index]
        if min_value < num:
            highest[min_index] = num
            result[min_index] = (x, y)
print(result)

The result isn't sorted in any manner, but it shouldn't be to hard to implement that if it's needed.

2 Comments

Thanks for the answer.
No worries, hope it helped

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.