Read streaming output from async subprocess

Question

I'm trying to read the URLs from a program running in a subprocess and then schedule an asynchronous HTTP request but it looks like the requests are running synchronously. Is that because subprocess and requests are both running in the same coroutine function?

test.py

import random
import time

URLS = ['http://example.com', 'http://example.com/sleep5s']

def main():
    for url in random.choices(URLS, weights=(1, 1), k=5):
        print(url)
        time.sleep(random.uniform(0.5, 1))


if __name__ == '__main__':
    main()

main.py

import asyncio
import sys

import httpx

from  httpx.exceptions import TimeoutException


async def req(url):
    async with httpx.AsyncClient() as client:
        try:
            r = await client.get(url, timeout=2)
            print(f'Response {url}: {r.status_code}')
        except Exception as TimeoutException:
            print(f'TIMEOUT - {url}')
        except Exception as exc:
            print(f'ERROR - {url}')


async def run():
    proc = await asyncio.create_subprocess_exec(
        sys.executable,
        '-u',
        'test.py',
        stdout=asyncio.subprocess.PIPE,
        stderr=asyncio.subprocess.PIPE,
    )

    while True:
        line = await proc.stdout.readline()
        if not line:
            break

        url = line.decode().rstrip()
        print(f'Found URL: {url}')

        resp = await req(url)

    await proc.wait()


async def main():
    await run()


if __name__ == '__main__':
    asyncio.run(main())

Test

$ python main.py
Found URL: http://example.com
Response http://example.com: 200
Found URL: http://example.com/sleep5s
TIMEOUT - http://example.com/sleep5s
Found URL: http://example.com/sleep5s
TIMEOUT - http://example.com/sleep5s
Found URL: http://example.com
Response http://example.com: 200
Found URL: http://example.com/sleep5s
TIMEOUT - http://example.com/sleep5s

user4815162342 · Accepted Answer · 2020-01-19 14:09:01Z

it looks like the requests are running synchronously. Is that because subprocess and requests are both running in the same coroutine function?

Your diagnosis is correct. await means what it says on the tin: the coroutine won't proceed until it has a result to give you. Fortunately, asyncio makes it easy to run a coroutine in the background:

    tasks = []
    while True:
        line = await proc.stdout.readline()
        if not line:
            break

        url = line.decode().rstrip()
        print(f'Found URL: {url}')

        tasks.append(asyncio.create_task(req(url)))

    resps = asyncio.gather(*tasks)
    await proc.wait()

Note that:

asyncio.create_task() ensures that requests start being processed even while we are still reading the lines
asyncio.gather() ensures that all the tasks are in fact waited for before the coroutine completes. It also provides access to responses and propagates exceptions, if any.

Collectives™ on Stack Overflow

Read streaming output from async subprocess

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related