2

I'm trying to apply certain function with two adjacent elements in given data set. Please refer to the following example.

# I'll just make a simple function here.
# In my real case, I send request to database 
# to get the result with two arguments.

def get_data_from_db_with(arg1, arg2):
    # write a query with arg1 and arg2 named 'query_result'
    return query_result

data = [arg1, arg2, arg3, arg4]
result = []
for a, b in zip(data, data[1:]):
    result.append(get_data_from_db_with(a, b))

For example if the length of data is 4 as is the case seen above, then I send request 3 times to database. Each request takes about 0.3 sec to retrieve the data, thus 0.9 second (0.3 sec * 3 requests) in total. Problem is that as the number of request increases, so does the overall time. What I want to do is, if possible, send all requests at once. Basically, it'll look like this.

With the code above,

1) get_data_from_db_with(arg1, arg2)
2) get_data_from_db_with(arg2, arg3)
3) get_data_from_db_with(arg3, arg4)

will be processed consecutively.


What I want to do, if possible, is to send requests all at once, not consecutively. Of course, the number of requests remains unchanged. But the overall time consumption will decrease based on my assumption.

Now I'm searching for async, multiprocessing and so on. Any comment or feedback would be immensely helpful.

Thanks in advance.

2 Answers 2

3

Threads is probably what you are looking for. Assuming that most of the job get_data_from_db_with does is waiting for i/o, like calling database.

import threading

def get_data_from_db_with(arg1, arg2):
    # write a query with arg1 and arg2 named 'query_result'
    current_thread = threading.current_thread()
    current_thread.result = query_result

data = [arg1, arg2, arg3, arg4]
threads = []
for a, b in zip(data, data[1:]):
    t = threading.Thread(target=get_data_from_db_with, args=(a,b))
    t.start()
    threads.append(t)

results = []
for t in threads:
    t.join()
    results.append(t.result)

Note that this solution even preserves the order in results list.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for your advice! I do have a question on using threading. As far as I know, Python prefers multiprocessing over multithread given GIL (global interpreter lock). I might be wrong, but was just curious.
@GeeYeolNahm That totally depends on what you are trying to do. GIL is released on every i/o so as long as majority of time you do i/o (insted of cpu intensive tasks) then threads are prefered over processes.
I tried testing multi-threading and it worked!. It became, iin average, 2~3 times faster. Yeah you were right, multi-threading worked on this case with my working environment. Thanks again, freakish!
1

An alternative to multiprocessing is to work on query construction itself. Look for ways to combine the queries something like, (arg1 and arg2) or (arg2 and arg3)..., essentially try to fetch all the required data in a single call.

1 Comment

Thanks for sharing your thought. Yes I did search sending one request as you've mentioned. I'm still working on writing one-single query and parsing the result. I've been using elasticserach multisearch API. Above all, I do think sending one request outperforms sending multiple requests simultaneously!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.