2

When writing asynchronous crawlers using asyncio and aiohttp in Python, I have always had a question: why you must use async with, and it's easy to report errors if you don't use them.

Although aiohttp also has a method request, it can support calling a simpler api. I want to know what is the difference. I still like the requests module very much, I don't know if it can be used as simple as the requests module.

1 Answer 1

6

why you must use async with

It's not like you must use async with, it's just a fail-safe device for ensuring that the resources get cleaned up. Taking a classic example from the documentation:

async def fetch(session, url):
    async with session.get(url) as response:
        return await response.text()

You can re-write it as:

async def fetch(session, url):
    response = await session.get(url)
    return await response.text()

This version appears to work the same, but it doesn't close the response object, so some OS resources (e.g. the underlying connection) may continue to be held indefinitely. A more correct version would look like this:

async def fetch(session, url):
    response = await session.get(url)
    content = await response.text()
    response.close()
    return content

This version would still fail to close the response if an exception gets raised while reading the text. It could be fixed by using finally - which is exactly what with and async with do under the hood. With an async with block the code is more robust because the language makes sure that the cleanup code is invoked whenever execution leaves the block.

Sign up to request clarification or add additional context in comments.

7 Comments

think you for your answer. but i still have a question i want to know zhe async with why is important.
@hfldqwe The answer addresses that in the last paragraph. If some part is unclear, please be specific so I can improve it.
I am sorry to reply now, I have a place that I can't understand: I have emphasized in the documentation (and many other explanations) that I want to use async with, but it is not allowed to use it. And there is a problem with async with (that is, session is not close), I don't know if Python's object recycling mechanism can handle it. If I use aiohttp as a crawler in my project and use tornado to write the backend interface, then it seems that it is very difficult to write async with in Tornado's RequestHandler.
Usually I do this. ``` Async def login(self, client, username, password'): 。。。 Async with client.post(url.login_url(), headers=url.headers, data=data) as resp: Result = await resp.json() # Determine whether the login is successful If result['success'] == True: Return resp.cookies ``` ``` Async with aiohttp.ClientSession(cookie_jar=jar) as client: Cookies = await spider.login(client, username=username, password=password) ```
But in tornado I can't pass the client as a parameter to my crawler. E.g: ``` Class InfoHandler(BaseHandler): ''' Return to personal information ''' Async def get(self): Async with aiohttp.ClientSession() as client: Result = await spider.login(client,username='xxx', password='xxx') ``` This creates a session for each link, which is a waste of resources, and is not recommended for use in aiohttp. How can I do this?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.