Python find numbers between range in list or array

Question

I have a list with millions of numbers which are always increasing to the end, I need to find and return numbers within a specified range e.g. numbers greater than X but less than Y, the numbers in the list can change and the values I'm searching for change as well

I have been using this method, please note this is a basic example the numbers are not uniform or the same as shown below in my program

l = [i for i in range(2000000)]
nums = []
for element in l:
    if element > 950004:
        break
    if element > 950000:
        nums.append(element)
#[950001, 950002, 950003, 950004]

Although fast, I kind of need it to be a bit faster for what my program is doing, the numbers change a lot so I'm wondering if there's a better way to do this with a pandas series or a numpy array? but so far all I've done is make an example in numpy:

a = numpy.array(l,dtype=numpy.int64)

Would a pandas series be more functional? Making use of query()? what would be the best way to approach this with an array as opposed to a python list of python objects

Are you trying to filter your list to numbers between a set of values? — James
– James, Commented Oct 27, 2017 at 16:56
it seems that you're just trying to extract a slice from 950000 to 950004. What is the point? — RomanPerekhrest
– RomanPerekhrest, Commented Oct 27, 2017 at 17:01
Maybe, searching for the indices of the range-bounds with binary search would make it faster. — Calculator
– Calculator, Commented Oct 27, 2017 at 17:03
Could you update your example to be closer to your actual problem? — James
– James, Commented Oct 27, 2017 at 17:04

Calculator · Accepted Answer · 2017-10-27 17:35:11Z

12

Here is a solution using binary search. You are speaking of millions of numbers. Technically binary search will make the algorithm faster by decreasing the runtime complexity to O(log n) neglecting the final slicing step.

import bisect

l = [i for i in range(2000000)]
lower_bound = 950000
upper_bound = 950004

lower_bound_i = bisect.bisect_left(l, lower_bound)
upper_bound_i = bisect.bisect_right(l, upper_bound, lo=lower_bound_i)
nums = l[lower_bound_i:upper_bound_i]

edited Oct 27, 2017 at 17:35

answered Oct 27, 2017 at 17:23

Calculator

2,7991 gold badge15 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

ragardner Over a year ago

somehow it's slightly quicker than the numpy boolean slice O_o;

ragardner Over a year ago

thank you, I would probably not have found this solution browsing the docs

ekhumoro Over a year ago

This can be made even more efficient by using a lo=lower_bound_i argument in bisect.bisect_right.

juanpa.arrivillaga Over a year ago

@new_to_coding this list-slice, you mean? Yeah, a boolean slice creates a whole boolean array the size of your current array. A Python list-slice is extremely fast and efficient, implemented in C.

James · Accepted Answer · 2017-10-27 17:01:57Z

3

You could use numpy to get a subset of your list using a boolean slice.

import numpy as np
a = np.arange(2000000)
nums = a[(950000<a) & (a<=950004)]
nums
# returns
array([950001, 950002, 950003, 950004])

edited Oct 27, 2017 at 17:01

answered Oct 27, 2017 at 16:54

James

37k4 gold badges54 silver badges79 bronze badges

2 Comments

ragardner Over a year ago

the numbers are not the same in my program, this is a basic example, sorry

Moses Koledoye Over a year ago

This takes all the numbers in the entire array, and does not exit after hitting the stop condition.

Maor Veitsman · Accepted Answer · 2017-10-27 17:31:27Z

3

The following are two implementations for binary search (based on code from here) - one which searches for an upper limit and one which searches for a lower limit. Does this work better for you?

def binary_search_upper(seq, limit):
    min = 0
    max = len(seq) - 1
    while True:
        if max < min:
            return -1
        m = (min + max) / 2
        if m == (len(seq) -1) or (seq[m] <= limit and seq[m+1] > limit):
            return m
        elif seq[m] < limit:
            min = m+1
        else:
            max = m - 1

def binary_search_lower(seq, limit):
    min = 0
    max = len(seq) - 1
    while True:
        if max < min:
            return -1
        m = (min + max) / 2
        if m == 0 or (seq[m] >= limit and seq[m-1] < limit):
            return m
        elif seq[m] < limit:
            min = m+1
        else:
            max = m - 1


l = [i for i in range(2000000)]
print binary_search_upper(l, 950004)
print binary_search_lower(l, 950000)

answered Oct 27, 2017 at 17:31

Maor Veitsman

1,56410 silver badges22 bronze badges

5 Comments

juanpa.arrivillaga Over a year ago

Why would you use this recipe, which is over 14 years old, and not the standard-library implementation (which will have much, much better constant factors)?

ekhumoro Over a year ago

JFTR, this is about ten times slower than bisect.

Maor Veitsman Over a year ago

@ekhumoro interesting, where can I learn more about how bisect is so fast?

ekhumoro Over a year ago

@MaorVeitsman. The source code for the module shows a pure python implementation, but the real implementation is written in C.

ekhumoro Over a year ago

@MaorVeitsman. PS: the pure python bisect implementation is about twice as fast as yours :P

Collectives™ on Stack Overflow

Python find numbers between range in list or array

3 Answers 3

4 Comments

2 Comments

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

2 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related