2

I am trying to return a list of XHR urls from Python Async. Below is my code.

import asyncio
from pyppeteer import launch

async def intercept_response(res):
    resourceType = res.request.resourceType
    xhr_list = []
    if resourceType in ['xhr']:
        print(res.request.url)
        xhr_list.append(res.request.url)
    return xhr_list

async def main():
    browser = await launch(headless=False)
    page = await browser.newPage()
    page.on('response', intercept_response)
    await page.setUserAgent('Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1')
    await page.goto('https://www.iesdouyin.com/share/user/70015326114', waitUntil = 'networkidle2')
    await browser.close()

if __name__ == '__main__':
    url = asyncio.run(main())
    print(url)

However, when I run the code, res.request.url got printed out, but xhr_list is not returned, causing url to be None. Is there something wrong with my code?

1
  • url will be assigned whatever value you return from main. Since you don't return anything from it, url is set to None. Commented May 17, 2020 at 12:24

1 Answer 1

2

There are two problems with your code. First, intercept_response tries to construct a list, but the list is always freshly created and always consists of at most a single element. Since intercept_response is called multiple times, it should append to the same list.

Also, you need to ensure that the return value of intercept_response propagates to main, and actually return it from there. For example, you can use a closure (an inner def) that assigns to a variable defined in the outer scope:

async def main():
    browser = await launch(headless=False)
    page = await browser.newPage()
    url = []
    async def intercept_response(res):
        if res.request.resourceType == 'xhr':
            print(res.request.url)
            url.append(res.request.url)
    page.on('response', intercept_response)
    await page.setUserAgent('Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1')
    await page.goto('https://www.iesdouyin.com/share/user/70015326114', waitUntil = 'networkidle2')
    await browser.close()
    return url
Sign up to request clarification or add additional context in comments.

4 Comments

This code partially soloved my problem. But this problem remains: Three XHR urls got printed, but only one was appended to the xhr_list. Thus the url returned has only 1 element. I suspect this has something to do with Asynico await method?
Is this because I am supposed to use Gather to get the return value? @user4815162342
@jackliu Your problem is unrelated to asyncio, your intercept_response always creates a fresh list. When called multiple times, it returns three different lists, and the last one gets used. I've now edited the code so that different invocations of intercept_response append to the same list.
@jackliu No problem. If the question is now resolved, please remember to accept the answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.