29

I need to find just the the smallest nth element in a 1D numpy.array.

For example:

a = np.array([90,10,30,40,80,70,20,50,60,0])

I want to get 5th smallest element, so my desired output is 40.

My current solution is this:

result = np.max(np.partition(a, 5)[:5])

However, finding 5 smallest elements and then taking the largest one them seems little clumsy to me. Is there a better way to do it? Am I missing a single function that would achieve my goal?

There are questions with similar titles to this one, but I did not see anything that answered my question.

Edit:

I should've mentioned it originally, but performance is very important for me; therefore, heapq solution though nice would not work for me.

import numpy as np
import heapq

def find_nth_smallest_old_way(a, n):
    return np.max(np.partition(a, n)[:n])

# Solution suggested by Jaime and HYRY    
def find_nth_smallest_proper_way(a, n):
    return np.partition(a, n-1)[n-1]

def find_nth_smallest_heapq(a, n):
    return heapq.nsmallest(n, a)[-1]
#    
n_iterations = 10000

a = np.arange(1000)
np.random.shuffle(a)

t1 = timeit('find_nth_smallest_old_way(a, 100)', 'from __main__ import find_nth_smallest_old_way, a', number = n_iterations)
print 'time taken using partition old_way: {}'.format(t1)    
t2 = timeit('find_nth_smallest_proper_way(a, 100)', 'from __main__ import find_nth_smallest_proper_way, a', number = n_iterations)
print 'time taken using partition proper way: {}'.format(t2) 
t3 = timeit('find_nth_smallest_heapq(a, 100)', 'from __main__ import find_nth_smallest_heapq, a', number = n_iterations)  
print 'time taken using heapq : {}'.format(t3)

Result:

time taken using partition old_way: 0.255564928055
time taken using partition proper way: 0.129678010941
time taken using heapq : 7.81094002724
3
  • Also, might be beneficial to check out docs.python.org/2/library/heapq.html Commented Mar 20, 2014 at 22:12
  • 2
    @C.B. the above question is significantly different from mine; it asks for both min and max, and it is for 2D matrix Commented Mar 20, 2014 at 22:16
  • 4
    How is this a duplicate? The title sounds similar, but the question itself is very different. Sometimes different questions lead to same answers, but here the answers are also very different. And there is no way an answer in that question is an answer to my question. Commented Mar 21, 2014 at 3:37

3 Answers 3

44

Unless I am missing something, what you want to do is:

>>> a = np.array([90,10,30,40,80,70,20,50,60,0])
>>> np.partition(a, 4)[4]
40

np.partition(a, k) will place the k+1-th smallest element of a at a[k], smaller values in a[:k] and larger values in a[k+1:]. The only thing to be aware of is that, because of the 0 indexing, the fifth element is at index 4.

Sign up to request clarification or add additional context in comments.

5 Comments

Yeah, that's it. I was thinking about it wrong. I knew there was a better solution!
It should be np.partition(a, 4)[3]
ok, the 5th element.
Found out that k must be greater than or equal to the number in bracket []. Otherwise the wrong answer will pop out (which I expected it to be an error). I leave this comment to prevent the case someone misuse it to get wrong answer
"np.partition(a, 4)[3]" is never what you want -- element [4] is guaranteed to be the 5th smallest. The element [3] is just going to be one of the four smallest, but you don't know which one.
5

You can use heapq.nsmallest:

>>> import numpy as np
>>> import heapq
>>> 
>>> a = np.array([90,10,30,40,80,70,20,50,60,0])
>>> heapq.nsmallest(5, a)[-1]
40

7 Comments

Check your performance, though. I recently had a situation in which heapq.nsmallest looked perfect, but slicing sorted turned out to be about 25% faster. I believe the heap approach is faster for some data, but not for all. I don't know if there's anything special about numpy arrays that would affect that one way or the other.
@PeterDeGlopper Well the sorting approach might be faster for smaller data sets, but for larger ones the heap method should be faster. How large was the data you're referring to?
Not large - lists of about 100 3-tuples of integers. So probably well below the level at which the heap method wins.
The solution that I have in my original post is O(n), since both np.partition and np.max are O(n).
I've seen some instances where actual heapify plus n heappop operations is way faster than using nsmallest or slicing sorted. Just to throw that out there.
|
2

you don't need call numpy.max():

def nsmall(a, n):
    return np.partition(a, n)[n]

2 Comments

It should be np.partition(a, n)[n-1]
@heroxbd, it should be np.partition(a, n)[n], not np.partition(a, n)[n-1]. The documentation to numpy 1.24 says: "The k-th value of the element will be in its final sorted position and all smaller elements will be moved before it and all equal or greater elements behind it." The argument kth=0 is a valid value, and it causes the function to place the smallest element to the beginning, so np.partition(a, kth=0)[0] works as well as for other n > 0.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.