1

I am scraping data from a site with a paginated table (max results 500 with 25 results per page). When I use chrome to "view source" I can see all 500 results, however, once the JS renders in selenium only 25 results show when using driver.page_source.

I have tried passing the cookies and headers off to requests, but that's not reliable and need to stick with selenium. I have also made a janky solution of clicking through the paginator's next button, but there must be a better way!

So how does one capture the full page source prior to JS rendering using selenium with the python bindings?

2
  • Update the question with the relevant HTML and your code trials Commented Nov 25, 2018 at 18:16
  • The page source is irrelevant. This question applies to any scenario in which JS modifies the DOM during rendering. In my current scenario, the JS is hiding page source in JS variables after rendering. I need to capture the page source after it loads from the server and prior to any JS rendering.The only thing I have been able to find is driver.page_source which is obviously returning the source post rendering. Commented Nov 25, 2018 at 18:20

1 Answer 1

1

There might be a simpler way but it turns out you can do all kinds of asynchronous things from the browser including fetch:

def fetch(url):
  return driver.execute_async_script("""
    (async () => {
      let r = await fetch('""" + url + """')
      arguments[0](await r.text())
    })()
  """)

html = fetch('https://stackoverflow.com/')

Same-origin policy will apply.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.