4

Long story short : switched from Selenium to Requests(-html).

Works OK but not in every case.

Page : https://www.winamax.fr/paris-sportifs/sports/1/1/1

Upon load it charges dynamic content with english games (example : Sheffield United - West Ham).

But when I try to do this :

from requests_html import HTMLSession
session = HTMLSession()
r = session.get('https://www.winamax.fr/paris-sportifs/1/1/1')
r.html.render()
print(r.html.text) # I also tried print(r.html.html)

the games don't show in the output.

Why ? Thanks !

5
  • Does this answer your question? How to retrieve the values of dynamic html content using Python Commented Jan 9, 2020 at 14:12
  • Simply because the "output" is not included in the page. In the Browser it is added to the DOM of the page by Javascript. Request does not run the Javascript. Selenium uses a browser to work. Commented Jan 9, 2020 at 14:13
  • but requests-html wasn't it supposed to handle JavaScript Support ??? requests-html.kennethreitz.org/#javascript-support Commented Jan 9, 2020 at 14:16
  • i tried to find the ajax json with the data, with no luck yet (unsure how to perform this) Commented Jan 9, 2020 at 19:03
  • 1
    As OP says, it is the Requests-HTML library he is using, which should be able to render JS and then inspect dynamically generated elements in the DOM, unlike Requests. Like OP, I've also experienced this to be quite sketchy though: On some sites, render() works and on others it doesn't. Would be great to know why. Commented Apr 2, 2022 at 11:03

2 Answers 2

4

add timeout, it should work, sorry this must be a comment but I cannot comment..

from requests_html import HTMLSession
session = HTMLSession()
r = session.get('https://www.winamax.fr/paris-sportifs/sports/1/1/1')
r.html.render(timeout=20)
print(r.html.html)
session.close()
Sign up to request clarification or add additional context in comments.

2 Comments

when im running, i get this RuntimeWarning: C:\Applications__\python372\lib\site-packages\pyee\_base.py:81: RuntimeWarning: coroutine 'Browser._targetCreated' was never awaited f(*args, **kwargs) RuntimeWarning: Enable tracemalloc to get the object allocation traceback. and then nothing is happening, it never stops from running. even with CTRL+C i cant stop the process.
How can I use this with BeautifulSoup?
2

I found that using the sleep parameter in the render function to wait for a few seconds before rendering was the only thing that worked for me:

from requests_html import HTMLSession
session = HTMLSession()
r = session.get('https://www.winamax.fr/paris-sportifs/sports/1/1/1')
r.html.render(sleep=10)
print(r.html.html)
session.close()

From the requests-html documentation:

render(retries: int = 8, script: str = None, wait: float = 0.2, scrolldown=False, **sleep: int = 0**, reload: bool = True, timeout: Union[float, int] = 8.0, keep_page: bool = False, cookies: list = [{}], send_cookies_session: bool = False)[source]

Reloads the response in Chromium, and replaces HTML content with an updated version, with JavaScript executed.

Parameters:

  • retries – The number of times to retry loading the page in Chromium.
  • script – JavaScript to execute upon page load (optional).
  • wait – The number of seconds to wait before loading the page, preventing timeouts (optional).
  • scrolldown – Integer, if provided, of how many times to page down.
  • sleep – Integer, if provided, of how many seconds to sleep after initial render.
  • reload – If False, content will not be loaded from the browser, but will be provided from memory.
  • keep_page – If True will allow you to interact with the browser page through r.html.page.
  • send_cookies_session – If True send HTMLSession.cookies convert.
  • cookies – If not empty send cookies.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.