multi task at once in python

Question

I'm trying to apply certain function with two adjacent elements in given data set. Please refer to the following example.

# I'll just make a simple function here.
# In my real case, I send request to database 
# to get the result with two arguments.

def get_data_from_db_with(arg1, arg2):
    # write a query with arg1 and arg2 named 'query_result'
    return query_result

data = [arg1, arg2, arg3, arg4]
result = []
for a, b in zip(data, data[1:]):
    result.append(get_data_from_db_with(a, b))

For example if the length of data is 4 as is the case seen above, then I send request 3 times to database. Each request takes about 0.3 sec to retrieve the data, thus 0.9 second (0.3 sec * 3 requests) in total. Problem is that as the number of request increases, so does the overall time. What I want to do is, if possible, send all requests at once. Basically, it'll look like this.

With the code above,

1) get_data_from_db_with(arg1, arg2)
2) get_data_from_db_with(arg2, arg3)
3) get_data_from_db_with(arg3, arg4)

will be processed consecutively.

What I want to do, if possible, is to send requests all at once, not consecutively. Of course, the number of requests remains unchanged. But the overall time consumption will decrease based on my assumption.

Now I'm searching for async, multiprocessing and so on. Any comment or feedback would be immensely helpful.

Thanks in advance.

freakish · Accepted Answer · 2017-04-14 10:03:03Z

3

Threads is probably what you are looking for. Assuming that most of the job get_data_from_db_with does is waiting for i/o, like calling database.

import threading

def get_data_from_db_with(arg1, arg2):
    # write a query with arg1 and arg2 named 'query_result'
    current_thread = threading.current_thread()
    current_thread.result = query_result

data = [arg1, arg2, arg3, arg4]
threads = []
for a, b in zip(data, data[1:]):
    t = threading.Thread(target=get_data_from_db_with, args=(a,b))
    t.start()
    threads.append(t)

results = []
for t in threads:
    t.join()
    results.append(t.result)

Note that this solution even preserves the order in results list.

answered Apr 14, 2017 at 10:03

freakish

57k12 gold badges141 silver badges181 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

higee Over a year ago

Thanks for your advice! I do have a question on using threading. As far as I know, Python prefers multiprocessing over multithread given GIL (global interpreter lock). I might be wrong, but was just curious.

freakish Over a year ago

@GeeYeolNahm That totally depends on what you are trying to do. GIL is released on every i/o so as long as majority of time you do i/o (insted of cpu intensive tasks) then threads are prefered over processes.

higee Over a year ago

I tried testing multi-threading and it worked!. It became, iin average, 2~3 times faster. Yeah you were right, multi-threading worked on this case with my working environment. Thanks again, freakish!

Gautham Kumaran · Accepted Answer · 2017-04-14 09:57:36Z

1

An alternative to multiprocessing is to work on query construction itself. Look for ways to combine the queries something like, (arg1 and arg2) or (arg2 and arg3)..., essentially try to fetch all the required data in a single call.

answered Apr 14, 2017 at 9:57

Gautham Kumaran

4415 silver badges16 bronze badges

1 Comment

higee Over a year ago

Thanks for sharing your thought. Yes I did search sending one request as you've mentioned. I'm still working on writing one-single query and parsing the result. I've been using elasticserach multisearch API. Above all, I do think sending one request outperforms sending multiple requests simultaneously!

Collectives™ on Stack Overflow

multi task at once in python

2 Answers 2

3 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related