3

There are plenty of questions on here where one wants to find the nth smallest element in a numpy array. However, what if you have an array of arrays? Like so:

>>> print matrix
[[ 1.          0.28958002  0.09972488 ...,  0.46999924  0.64723113
   0.60217694]
 [ 0.28958002  1.          0.58005657 ...,  0.37668355  0.48852272
   0.3860152 ]
 [ 0.09972488  0.58005657  1.         ...,  0.13151364  0.29539992
   0.03686381]
 ..., 
 [ 0.46999924  0.37668355  0.13151364 ...,  1.          0.50250212
   0.73128971]
 [ 0.64723113  0.48852272  0.29539992 ...,  0.50250212  1.          0.71249226]
 [ 0.60217694  0.3860152   0.03686381 ...,  0.73128971  0.71249226  1.        ]]

How can I get the n smallest items out of this array of arrays?

>>> print type(matrix)
<type 'numpy.ndarray'>

This is how I have been doing it to find the coordinates of the smallest item:

min_cordinates = []
for i in matrix:
    if numpy.any(numpy.where(i==numpy.amin(matrix))[0]):
        min_cordinates.append(int(numpy.where(i==numpy.amin(matrix))[0][0])+1)

Now I would like to find, for example, the 10 smallest items.

3 Answers 3

6

Flatten the matrix, sort and then select the first 10.

print(numpy.sort(matrix.flatten())[:10])
Sign up to request clarification or add additional context in comments.

1 Comment

Instead of calling matrix.flatten(), you could use numpy.sort(matrix, axis=None)[:10]
6

If your array is not large, the accepted answer is fine. For large arrays, np.partition will accomplish this much more efficiently. Here's an example where the array has 10000 elements, and you want the 10 smallest values:

In [56]: np.random.seed(123)

In [57]: a = 10*np.random.rand(100, 100)

Use np.partition to get the 10 smallest values:

In [58]: np.partition(a, 10, axis=None)[:10]
Out[58]: 
array([ 0.00067838,  0.00081888,  0.00124711,  0.00120101,  0.00135942,
        0.00271129,  0.00297489,  0.00489126,  0.00556923,  0.00594738])

Note that the values are not in increasing order. np.partition does not guarantee that the first 10 values will be sorted. If you need them in increasing order, you can sort the selected values afterwards. This will still be faster than sorting the entire array.

Here's the result using np.sort:

In [59]: np.sort(a, axis=None)[:10]
Out[59]: 
array([ 0.00067838,  0.00081888,  0.00120101,  0.00124711,  0.00135942,
        0.00271129,  0.00297489,  0.00489126,  0.00556923,  0.00594738])

Now compare the timing:

In [60]: %timeit np.partition(a, 10, axis=None)[:10]
10000 loops, best of 3: 75.1 µs per loop

In [61]: %timeit np.sort(a, axis=None)[:10]
1000 loops, best of 3: 465 µs per loop

In this case, using np.partition is more than six times faster.

Comments

3

You can use the heapq.nsmallest function to return the list of the 10 smallest elements.

In [84]: import heapq

In [85]: heapq.nsmallest(10, matrix.flatten())
Out[85]: 
[-1.7009047695355393,
 -1.4737632239971061,
 -1.1246243781838825,
 -0.7862983016935523,
 -0.5080863016259798,
 -0.43802651199959347,
 -0.22125698200832566,
 0.034938408281615596,
 0.13610084041121048,
 0.15876389111565958]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.