7

I have two scripts, scraper.py and db_control.py. In scraper.py I have something like this:

...
def scrape(category, field, pages, search, use_proxy, proxy_file):
    ...
    loop = asyncio.get_event_loop()

    to_do = [ get_pages(url, params, conngen) for url in urls ]
    wait_coro = asyncio.wait(to_do)
    res, _ = loop.run_until_complete(wait_coro)
    ...
    loop.close()
    
    return [ x.result() for x in res ]

...

And in db_control.py:

from scraper import scrape
...
while new < 15:
    data = scrape(category, field, pages, search, use_proxy, proxy_file)
    ...
...

Theoretically, scraper should be started unknown-times until enough of data have been obtained. But when new is not imidiatelly > 15 then this error occurs:

  File "/usr/lib/python3.4/asyncio/base_events.py", line 293, in run_until_complete
self._check_closed()
  File "/usr/lib/python3.4/asyncio/base_events.py", line 265, in _check_closed
raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed

But scripts works just fine if I run scrape() only once. So I guess there is some problem with recreating loop = asyncio.get_event_loop(), I have tried this but nothing changed. How I can fix this? Of course those are just snippets of my code, if you think problem can be elsewhere, full code is available here.

1
  • fyi it's scraper (and scrape, scraped, scraping) not scrapper Commented May 1, 2021 at 16:15

1 Answer 1

8

Methods run_until_complete, run_forever, run_in_executor, create_task, call_at explicitly check the loop and throw exception if it's closed.

Quote from docs - BaseEvenLoop.close:

This is idempotent and irreversible


Unless you have some(good) reasons, you might simply omit the close line:

def scrape(category, field, pages, search, use_proxy, proxy_file):
    #...
    loop = asyncio.get_event_loop()

    to_do = [ get_pages(url, params, conngen) for url in urls ]
    wait_coro = asyncio.wait(to_do)
    res, _ = loop.run_until_complete(wait_coro)
    #...
    # loop.close()
    return [ x.result() for x in res ]

If you want to have each time a brand new loop, you have t create it manually and set as default:

def scrape(category, field, pages, search, use_proxy, proxy_file):
    #...
    loop = asyncio.new_event_loop()
    asyncio.set_event_loop(loop)    
    to_do = [ get_pages(url, params, conngen) for url in urls ]
    wait_coro = asyncio.wait(to_do)
    res, _ = loop.run_until_complete(wait_coro)
    #...
    return [ x.result() for x in res ]
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks! Works like a charm now :)
Should the loop in the first code snippet be closed? If so, where?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.