Python synchronous code example faster than async

Question

I was migrating a production system to async when I realized the synchronous version is 20x faster than the async version. I was able to create a very simple example to demonstrate this in a repeatable way;

Asynchronous Version

import asyncio, time

data = {}

async def process_usage(key):
    data[key] = key

async def main():
    await asyncio.gather(*(process_usage(key) for key in range(0,1000000)))

s = time.perf_counter()
results = asyncio.run(main())
elapsed = time.perf_counter() - s
print(f"Took {elapsed:0.2f} seconds.")

This takes 19 seconds. The code loops through 1M keys and builds a dictionary, data with the same key and value.

$ python3.7 async_test.py
Took 19.08 seconds.

Synchronous Version

import time

data = {}

def process_usage(key):
    data[key] = key

def main():
    for key in range(0,1000000):
        process_usage(key)

s = time.perf_counter()
results = main()
elapsed = time.perf_counter() - s
print(f"Took {elapsed:0.2f} seconds.")

This takes 0.17 seconds! And does exactly the same thing as above.

$ python3.7 test.py
Took 0.17 seconds.

Asynchronous Version with create_task

import asyncio, time

data = {}

async def process_usage(key):
    data[key] = key

async def main():
    for key in range(0,1000000):
        asyncio.create_task(process_usage(key))

s = time.perf_counter()
results = asyncio.run(main())
elapsed = time.perf_counter() - s
print(f"Took {elapsed:0.2f} seconds.")

This version brings it down to 11 seconds.

$ python3.7 async_test2.py
Took 11.91 seconds.

Why does this happen?

In my production code I will have a blocking call in process_usage where I save the value of key to a redis database.

Well for one, your asynchronous code has to generate a function call with 1 million arguments, which will require loading that into memory. Whereas your synchronous code just uses the efficient range() iterator — Kyle Willmon
– Kyle Willmon, Commented May 7, 2019 at 16:01
@KyleWillmon I'm new to async is there a better way to do this? In production I also have to loop through 1M keys but from a database not the range function. — Jonathan
– Jonathan, Commented May 7, 2019 at 16:04
As far as I know, you're always going to need quite a bit of overhead to keep track of 1 million coroutines. However, 19 seconds does seem excessive for this trivial example. Perhaps someone else can explain more about that. — Kyle Willmon
– Kyle Willmon, Commented May 7, 2019 at 16:14
Why would you expect asyncio to be faster here? Your doing completely cpu bound work. — juanpa.arrivillaga
– juanpa.arrivillaga, Commented May 7, 2019 at 17:23
But then your benchmark has no bearing on what you care about. — juanpa.arrivillaga
– juanpa.arrivillaga, Commented May 7, 2019 at 17:28

user4815162342 · Accepted Answer · 2019-05-08 06:25:02Z

8

When comparing those benchmarks one should note that the asynchronous version is, well, asynchronous: asyncio spends a considerable effort to ensure that the coroutines you submit can run concurrently. In your particular case they don't actually run concurrently because process_usage doesn't await anything, but the system doesn't actually that. The synchronous version on the other hand makes no such provisions: it just runs everything sequentially, hitting the happy path of the interpreter.

A more reasonable comparison would be for the synchronous version to try to parallelize things in the way idiomatic for synchronous code: by using threads. Of course, you won't be able to create a separate thread for each process_usage because, unlike asyncio with its tasks, the OS won't allow you to create a million threads. But you can create a thread pool and feed it tasks:

def main():
    with concurrent.futures.ThreadPoolExecutor() as executor:
        for key in range(0,1000000):
            executor.submit(process_usage, key)
        # at the end of "with" the executor automatically
        # waits for all futures to finish

On my system this takes ~17s, whereas the asyncio version takes ~18s. (The faster asyncio version takes ~13s.)

If the speed gain of asyncio is so small, one could ask why bother with asyncio? The difference is that with asyncio, assuming idiomatic code and IO-bound coroutines, you have at your disposal a virtually unlimited number of tasks that in a very real sense execute concurrently. You can create tens of thousands of asynchronous connections at the same time, and asyncio will happily juggle them all at once, using a high-quality poller and a scalable coroutine scheduler. With a thread pool the number of tasks executed in parallel is always limited by the number of threads in the pool, typically in the hundreds at most.

Even toy examples have value, for learning if nothing else. If you are using such microbenchmarks to make decisions, I suggest investing some more effort to give the examples more realism. The coroutine in the asyncio example should contain at least one await, and the sync example should use threads to emulate the same amount of parallelism you obtain with async. If you adjust both to match your actual use case, then the benchmark actually puts you in a position to make a (more) informed decision.

edited May 8, 2019 at 6:25

answered May 7, 2019 at 17:13

user4815162342

159k22 gold badges350 silver badges418 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Jonathan Over a year ago

Thanks, this helped me better understand why this happens. My actual production function does an async write to redis with aioredis but I now understand the source of the overhead.

user4815162342 Over a year ago

@Jonathan It would be interesting to examine your original problem in more detail. It's far from clear why parallel asyncio connections to redis would fare slower than the same number of sequential connections, except redis itself getting overwhelmed and undeperforming. Perhaps the performance of your code would be best improved through judicious use of semaphores or a queue feeding a fixed number of workers. Creating a huge number of concurrent tasks is possible in asyncio, but it doesn't mean that it's the optimal approach for every problem.

Jonathan Over a year ago

It basically read usage data for each key from a dict and if usage was above eg. 1000 then it would write the key to a rate_limit redis db. It's really that simple. The synchronous version takes 1s, I'm trying out a mix of synchronous for that and async for the rest of the script (to do a number of batch writes to dynamodb). Hope that helps.

Mikhail Gerasimov · Accepted Answer · 2019-05-07 17:20:49Z

Why does this happen?

TL;DR

Because using asyncio itself doesn't speedup code. You need multiple gathered network I/O related operations to see the difference toward synchronous version.

Detailed

asyncio is not a magic that allows you to speedup arbitrary code. With or without asyncio your code is still being run by CPU with limit performance.

asyncio is a way to manage multiple execution flows (coroutines) in a nice, clear way. Multiple execution flows allow you to start next I/O-related operation (such as request to database) before waiting for other one to be completed. Please read this answer for more detailed explanation.

Please also read this answer for explanation when it makes sense to use asyncio.

Once you start to use asyncio right way overhead for using it should be much lower than benefits you get for parallelizing I/O operations.

Collectives™ on Stack Overflow

Python synchronous code example faster than async

2 Answers 2

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related