You could consider using a multiprocessing Pool. You can then use map to send chunks of your iterable to the workers of the Pool to work on according to a given function. So, lets say your query is a function, say, query(data):
def query(data):
rslt = query in databae where data == 10
if rslt.property == 'some condition here':
return rslt
We will use the pool like so:
from multiprocessing import Pool
with Pool() as pool:
results = pool.map(query, data_list)
Now to your rquierement we will find the first one:
print(next(filter(None, results)))
Note that using the function query this way means that results will be a list of rslts and Nones and we are looking for the first non-None result.
A few Notes:
- Note that the
Pool's constrctor's first argument is processes which allows you to choose how many processes the pool will hold:
If processes is None then the number returned by os.cpu_count() is used.
- Note that
map has aslo the chunksize argument which is default to 1 and allows to choose the size of the chunks passed to the workers:
This method chops the iterable into a number of chunks which it submits to the process pool as separate tasks. The (approximate) size of these chunks can be specified by setting chunksize to a positive integer.
Continuing with map, the docs recommend using imap for large iterables with a specific chunk for better efficiency:
Note that it may cause high memory usage for very long iterables. Consider using imap() or imap_unordered() with explicit chunksize option for better efficiency.
And from the imap docs:
The chunksize argument is the same as the one used by the map() method. For very long iterables using a large value for chunksize can make the job complete much faster than using the default value of 1.
So we could actually be more efficient and do:
chunksize = 100
processes = 10
with Pool(processes=processes) as pool:
print(next(filter(None, pool.imap(query, data_list, chunksize=chunksize))))
And here you could play with chunksize and even processes (back from the Pool) and see what combination produces the best results.
If you are interested, you could easily switch to threads instead of processes by simply changing your import statement to:
from multiprocessing.dummy import Pool
As the docs say:
multiprocessing.dummy replicates the API of multiprocessing but is no more than a wrapper around the threading module.
Hope this helps in any way