Sending a click with requests_html and pyppeteer python

Question

I'm attempting to login to a website, click a button, and then scrape some data. The page must be rendered because it's all in JavaScript (and thus unavailable if you were [for example] to View Source in a web browser).

Everything works except for when it comes time to send the click.

When I try to send the click with the requests_html package, it doesn't appear to do anything, although no errors are thrown. I understand it leans heavily on pyppeteer, so I've been trying to jump between the docs, but the whole async programming thing is pretty confusing to me.

import asyncio
import requests_html

# Login information
payload = {
    'email': '[email protected]',
    'password': 'Password123'
}

# Start a session
with requests_html.HTMLSession() as s:
    p = s.post('https://www.website.com/login', data=payload)

    # Send the request now that we're logged in
    r = s.get('https://www.website.com/data')

    # Render the JavaScript page so it's accessible
    r.html.render(keep_page=True, scrolldown=5, sleep=5)

    async def click():
        await r.html.page.click(
                                selector='button.showAll', 
                                options={'delay':3, 'clickCount':1},              
                                )

    asyncio.get_event_loop().run_until_complete(click())

    print(r.html.html)

r.html.html contains the rendered HTML from the JS, but not with the button clicked. I've confirmed the button is being clicked, but I suspect the new page is not being 'saved' somehow, and that r.html.html is returning the pre-clicked page.

I would rather not use deprecated PhantomJS/Selenium. Scrapy is really heavy duty, and I'd rather not rely on Scrapy + Splash to get this done - I think I'm so close! MechanicalSoup doesn't work with JavaScript.

So I know you mentioned that you don’t want to use Selenium but from my experience it seems like it would be a lot easier. I know PhantomJS is deprecated but why not use chromedriver? — K-Log
– K-Log, Commented Sep 10, 2018 at 19:25
@K-Log In the end, this is what I did. I was able to re-write the whole thing in Selenium in like 5 minutes, after spending the requisite time setting up Selenium and Chromedriver. Still would be great to do in html_requests if possible someday! — tw0000
– tw0000, Commented Sep 11, 2018 at 9:29

zllvm · Accepted Answer · 2020-08-18 11:30:33Z

According to the request_html latest documentation you can pass a script parameter to the render method of the html object. This is equivalent to executing evaluate method of the (pyppeteer) page property, see requests_html.py (line: 523). For example (warning: quick and dirty code):

from requests_html import HTMLSession

session = HTMLSession()
r = session.get("http://xy.com")

script = """
    () => {
       const item = document.getElementById("foo");
       if(item) {
         item.click()
       }
    }
"""

r.html.render(sleep=sleep, timeout=timeout, script=script)

Bear in mind to provide an appropriate sleep interval to be sure that the rendering is finished. I have tested it and the result was correct (the page was doing an extra request to add more information when the button was clicked which I was able to find after applying the script).

Collectives™ on Stack Overflow

Sending a click with requests_html and pyppeteer python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related