Why render / requests-html doesn't scrape dynamic content?

Question

Long story short : switched from Selenium to Requests(-html).

Works OK but not in every case.

Page : https://www.winamax.fr/paris-sportifs/sports/1/1/1

Upon load it charges dynamic content with english games (example : Sheffield United - West Ham).

But when I try to do this :

from requests_html import HTMLSession
session = HTMLSession()
r = session.get('https://www.winamax.fr/paris-sportifs/1/1/1')
r.html.render()
print(r.html.text) # I also tried print(r.html.html)

the games don't show in the output.

Why ? Thanks !

Does this answer your question? How to retrieve the values of dynamic html content using Python — Ramon Medeiros
– Ramon Medeiros, Commented Jan 9, 2020 at 14:12
Simply because the "output" is not included in the page. In the Browser it is added to the DOM of the page by Javascript. Request does not run the Javascript. Selenium uses a browser to work. — Klaus D.
– Klaus D., Commented Jan 9, 2020 at 14:13
but requests-html wasn't it supposed to handle JavaScript Support ??? requests-html.kennethreitz.org/#javascript-support — jeremoquai
– jeremoquai, Commented Jan 9, 2020 at 14:16
i tried to find the ajax json with the data, with no luck yet (unsure how to perform this) — jeremoquai
– jeremoquai, Commented Jan 9, 2020 at 19:03
As OP says, it is the Requests-HTML library he is using, which should be able to render JS and then inspect dynamically generated elements in the DOM, unlike Requests. Like OP, I've also experienced this to be quite sketchy though: On some sites, render() works and on others it doesn't. Would be great to know why. — amy
– amy, Commented Apr 2, 2022 at 11:03

fardV · Accepted Answer · 2020-08-07 17:20:26Z

4

add timeout, it should work, sorry this must be a comment but I cannot comment..

from requests_html import HTMLSession
session = HTMLSession()
r = session.get('https://www.winamax.fr/paris-sportifs/sports/1/1/1')
r.html.render(timeout=20)
print(r.html.html)
session.close()

answered Aug 7, 2020 at 17:20

fardV

3081 silver badge10 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

alexzander Over a year ago

when im running, i get this RuntimeWarning:

C:\Applications__\python372\lib\site-packages\pyee\_base.py:81: RuntimeWarning: coroutine 'Browser._targetCreated' was never awaited   f(*args, **kwargs) RuntimeWarning: Enable tracemalloc to get the object allocation traceback.

and then nothing is happening, it never stops from running. even with CTRL+C i cant stop the process.

André Clérigo Over a year ago

How can I use this with BeautifulSoup?

MD Mushfirat Mohaimin · Accepted Answer · 2023-11-05 10:52:38Z

I found that using the sleep parameter in the render function to wait for a few seconds before rendering was the only thing that worked for me:

from requests_html import HTMLSession
session = HTMLSession()
r = session.get('https://www.winamax.fr/paris-sportifs/sports/1/1/1')
r.html.render(sleep=10)
print(r.html.html)
session.close()

From the requests-html documentation:

render(retries: int = 8, script: str = None, wait: float = 0.2, scrolldown=False, **sleep: int = 0**, reload: bool = True, timeout: Union[float, int] = 8.0, keep_page: bool = False, cookies: list = [{}], send_cookies_session: bool = False)[source]

Reloads the response in Chromium, and replaces HTML content with an updated version, with JavaScript executed.

Parameters:

retries – The number of times to retry loading the page in Chromium.
script – JavaScript to execute upon page load (optional).
wait – The number of seconds to wait before loading the page, preventing timeouts (optional).
scrolldown – Integer, if provided, of how many times to page down.
sleep – Integer, if provided, of how many seconds to sleep after initial render.
reload – If False, content will not be loaded from the browser, but will be provided from memory.
keep_page – If True will allow you to interact with the browser page through r.html.page.
send_cookies_session – If True send HTMLSession.cookies convert.
cookies – If not empty send cookies.

Collectives™ on Stack Overflow

Why render / requests-html doesn't scrape dynamic content?

2 Answers 2

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related