So I want to scrape a website that uses JavaScript/AJAX to generate additional results as you scroll down the page. I am using Python 3.7 with Selenium Chrome running headless. However, as scraping progresses, you end up with an ever expanding amount of code, which slows down my machine until it is at a standstill. Even simple operations like –
code = driver.page_source
– grow to take several seconds. I ran a test to see how much the codebase had grown, after a few hundred results it had expanded from an initial length of about a half-million characters to 25 million characters – 50 fold! My question is this:
1) Is there some way to have Selenium delete prior code (similar to the way you can delete it in Chrome's "inspect element" mode) to keep the size manageable?
2) Or is there some other simple solution that I'm overlooking?
F12does it in most web browsers), go to the Network tab, and look at the requests that are being sent while you use the website.