2

I'm trying to read the URLs from a program running in a subprocess and then schedule an asynchronous HTTP request but it looks like the requests are running synchronously. Is that because subprocess and requests are both running in the same coroutine function?

test.py

import random
import time

URLS = ['http://example.com', 'http://example.com/sleep5s']

def main():
    for url in random.choices(URLS, weights=(1, 1), k=5):
        print(url)
        time.sleep(random.uniform(0.5, 1))


if __name__ == '__main__':
    main()

main.py

import asyncio
import sys

import httpx

from  httpx.exceptions import TimeoutException


async def req(url):
    async with httpx.AsyncClient() as client:
        try:
            r = await client.get(url, timeout=2)
            print(f'Response {url}: {r.status_code}')
        except Exception as TimeoutException:
            print(f'TIMEOUT - {url}')
        except Exception as exc:
            print(f'ERROR - {url}')


async def run():
    proc = await asyncio.create_subprocess_exec(
        sys.executable,
        '-u',
        'test.py',
        stdout=asyncio.subprocess.PIPE,
        stderr=asyncio.subprocess.PIPE,
    )

    while True:
        line = await proc.stdout.readline()
        if not line:
            break

        url = line.decode().rstrip()
        print(f'Found URL: {url}')

        resp = await req(url)

    await proc.wait()


async def main():
    await run()


if __name__ == '__main__':
    asyncio.run(main())

Test

$ python main.py
Found URL: http://example.com
Response http://example.com: 200
Found URL: http://example.com/sleep5s
TIMEOUT - http://example.com/sleep5s
Found URL: http://example.com/sleep5s
TIMEOUT - http://example.com/sleep5s
Found URL: http://example.com
Response http://example.com: 200
Found URL: http://example.com/sleep5s
TIMEOUT - http://example.com/sleep5s

1 Answer 1

2

it looks like the requests are running synchronously. Is that because subprocess and requests are both running in the same coroutine function?

Your diagnosis is correct. await means what it says on the tin: the coroutine won't proceed until it has a result to give you. Fortunately, asyncio makes it easy to run a coroutine in the background:

    tasks = []
    while True:
        line = await proc.stdout.readline()
        if not line:
            break

        url = line.decode().rstrip()
        print(f'Found URL: {url}')

        tasks.append(asyncio.create_task(req(url)))

    resps = asyncio.gather(*tasks)
    await proc.wait()

Note that:

  • asyncio.create_task() ensures that requests start being processed even while we are still reading the lines
  • asyncio.gather() ensures that all the tasks are in fact waited for before the coroutine completes. It also provides access to responses and propagates exceptions, if any.
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.