2

I need to pull out some information from some links that contain some javascript code. I know how to do it with Selenium, but it takes a lot of time and I need more efficient way to pull this off.

I cam across the requests-html library and it looks quite robust way for my purposes, but unfortunately it doesn't look like I'm able to run the javascript with it.

I read the documentation from the following link https://requests-html.readthedocs.io/en/latest/

And tried the following code:

from requests_html import HTMLSession,HTML
from bs4 import BeautifulSoup

session = HTMLSession()
resp = session.get("https://drive.google.com/file/d/1rZ-DhTFPCen6DvJXlNl3Bxuwj4-ULwoa/view")

resp.html.render()

soup = BeautifulSoup(resp.html.html, 'lxml')

email = soup.find_all('img', {'class':'ndfHFb-c4YZDc-MZArnb-BA389-YLEF4c'})
print(email)

I get no results after running this code, even though the class exists if I open the link from my browser.

I've also tried using headers with my requests with no help. I tried the same code (with different html tag, of course) for another link (https://web.archive.org/web/*/stackoverflow.com) but I get some html text including a response that says that my browser must support javascript. My code for this part:

from requests_html import HTMLSession
from bs4 import BeautifulSoup

session = HTMLSession()
resp = session.get("https://web.archive.org/web/*/stackoverflow.com")

resp.html.render()

soup = BeautifulSoup(resp.html.html, 'lxml')


print(soup)

The response I get:

<div class="no-script-message">
        The Wayback Machine requires your browser to support JavaScript, please email <a href="mailto:[email protected]">[email protected]</a><br/>if you have any questions about this.
      </div>

Any help would be appreciated. Thanks!

2 Answers 2

1

In render, add sleep parameter

resp.html.render(sleep=2)
Sign up to request clarification or add additional context in comments.

1 Comment

For me, this was the answer
0

This should work on the site. But as you mentioned the code worked for the StackOverflow but did not work for the other URL? is it because the server might not respond or the tag that you are looking for may not be available at that time. but anyway the requests-HTML should have given you an error.

I was about to check your problem and add it to my blog post How to use Requests-HTMLbut unfortunately, the link you provided is not working.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.