I need to crawl a website, however, its content is dynamic. Are there any packages in Python that could call js functions? For example, suppose I have a link and JS functions 1, 2, and 3 in JS that I should call on that webpage, and I need the final webpage after all JS function calls.
1 Answer
Executing client-side javascript can get very complicated, so the most reliable way to run all javascript on a page just like a user would would be to use a real browser in headless mode. Specifically for Python, there is a Python+Selenium combo for working with headless Chrome. If you are willing to trade Python for Nodejs, a more powerful toolset is Puppeteer+headless Chrome (it lets you do a lot more than Selenium). There is also an early unofficial port of Puppeteer to Python but I haven't tried it and can't comment on how stable it is https://pypi.org/project/pyppeteer/