I'm attempting to login to a website, click a button, and then scrape some data. The page must be rendered because it's all in JavaScript (and thus unavailable if you were [for example] to View Source in a web browser).
Everything works except for when it comes time to send the click.
When I try to send the click with the requests_html package, it doesn't appear to do anything, although no errors are thrown. I understand it leans heavily on pyppeteer, so I've been trying to jump between the docs, but the whole async programming thing is pretty confusing to me.
import asyncio
import requests_html
# Login information
payload = {
'email': '[email protected]',
'password': 'Password123'
}
# Start a session
with requests_html.HTMLSession() as s:
p = s.post('https://www.website.com/login', data=payload)
# Send the request now that we're logged in
r = s.get('https://www.website.com/data')
# Render the JavaScript page so it's accessible
r.html.render(keep_page=True, scrolldown=5, sleep=5)
async def click():
await r.html.page.click(
selector='button.showAll',
options={'delay':3, 'clickCount':1},
)
asyncio.get_event_loop().run_until_complete(click())
print(r.html.html)
r.html.html contains the rendered HTML from the JS, but not with the button clicked. I've confirmed the button is being clicked, but I suspect the new page is not being 'saved' somehow, and that r.html.html is returning the pre-clicked page.
I would rather not use deprecated PhantomJS/Selenium. Scrapy is really heavy duty, and I'd rather not rely on Scrapy + Splash to get this done - I think I'm so close! MechanicalSoup doesn't work with JavaScript.
html_requestsif possible someday!