5

I'm attempting to login to a website, click a button, and then scrape some data. The page must be rendered because it's all in JavaScript (and thus unavailable if you were [for example] to View Source in a web browser).

Everything works except for when it comes time to send the click.

When I try to send the click with the requests_html package, it doesn't appear to do anything, although no errors are thrown. I understand it leans heavily on pyppeteer, so I've been trying to jump between the docs, but the whole async programming thing is pretty confusing to me.

import asyncio
import requests_html

# Login information
payload = {
    'email': '[email protected]',
    'password': 'Password123'
}

# Start a session
with requests_html.HTMLSession() as s:
    p = s.post('https://www.website.com/login', data=payload)

    # Send the request now that we're logged in
    r = s.get('https://www.website.com/data')

    # Render the JavaScript page so it's accessible
    r.html.render(keep_page=True, scrolldown=5, sleep=5)

    async def click():
        await r.html.page.click(
                                selector='button.showAll', 
                                options={'delay':3, 'clickCount':1},              
                                )

    asyncio.get_event_loop().run_until_complete(click())

    print(r.html.html)

r.html.html contains the rendered HTML from the JS, but not with the button clicked. I've confirmed the button is being clicked, but I suspect the new page is not being 'saved' somehow, and that r.html.html is returning the pre-clicked page.

I would rather not use deprecated PhantomJS/Selenium. Scrapy is really heavy duty, and I'd rather not rely on Scrapy + Splash to get this done - I think I'm so close! MechanicalSoup doesn't work with JavaScript.

2
  • 1
    So I know you mentioned that you don’t want to use Selenium but from my experience it seems like it would be a lot easier. I know PhantomJS is deprecated but why not use chromedriver? Commented Sep 10, 2018 at 19:25
  • @K-Log In the end, this is what I did. I was able to re-write the whole thing in Selenium in like 5 minutes, after spending the requisite time setting up Selenium and Chromedriver. Still would be great to do in html_requests if possible someday! Commented Sep 11, 2018 at 9:29

1 Answer 1

2

According to the request_html latest documentation you can pass a script parameter to the render method of the html object. This is equivalent to executing evaluate method of the (pyppeteer) page property, see requests_html.py (line: 523). For example (warning: quick and dirty code):

from requests_html import HTMLSession

session = HTMLSession()
r = session.get("http://xy.com")

script = """
    () => {
       const item = document.getElementById("foo");
       if(item) {
         item.click()
       }
    }
"""

r.html.render(sleep=sleep, timeout=timeout, script=script)

Bear in mind to provide an appropriate sleep interval to be sure that the rendering is finished. I have tested it and the result was correct (the page was doing an extra request to add more information when the button was clicked which I was able to find after applying the script).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.