Web Crawler with Ajax/JavaScript

Question

I have tried to use HtmlUnit to implement a crawler which can obtain the results generated by executing the Ajax request and javascript's execution.However, HtmlUnit is not so powerful to meet my demand because it can't obtain all the rendered DOM element generated by executing JavaScript or AJax. And then I aslo tried to use pywebkitgtk and pyQtwebkit, it did generated some dynamic DOM element.But they don't work stably, and I have no idea to tackle it. It seems that someone aslo mentioned using the selenium.Can anybody give me some suggestions to implement a Ajax Crawler? Many thanks!

Generally my understanding is you need JavaScript runtime to do what a real browser does like Ajax requests and async handler. I vote for the selenium way because it allows to operate real browser in scripting way so that it covers the web crawler scenario perfectly plus additional feature like screen shots. — shawnzhu
– shawnzhu, Commented Aug 21, 2013 at 1:43
Thanks for your reply. Okay,I will try to use the selenium. Hope it will work! :D — Joey
– Joey, Commented Aug 21, 2013 at 1:53

bestmike007 · Accepted Answer · 2015-01-07 11:15:56Z

1

PhantomJS might be a good solution to your problem. And you can also make use of some crawler api, e.g. Unicrawler, to simplify this. Hope it works.

edited Jan 7, 2015 at 11:15

answered Jan 7, 2015 at 10:40

bestmike007

3521 silver badge8 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Web Crawler with Ajax/JavaScript

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related