Maybe this is gonna sound naive and all, but is there something even remotely close to a php crawler for ajax based websites?
-
It's not PHP, so I'm not offering it as an answer, but HTMLUnit in Java is a fully scriptable headless browser component, complete with JS support - could be used as a crawler, too.Piskvor left the building– Piskvor left the building2011-05-20 11:02:26 +00:00Commented May 20, 2011 at 11:02
3 Answers
The problem is that vanilla PHP doesn't understand how to parse JavaScript, generate the JavaScript environment, and interact with everything. In order to theoretically do it, you would have to extend PHP via the C API and interface it with a JavaScript library. The scale of this is quite large depending on how many resources you have.
Comments
Not automatic crawlers, because they would need to understand the javascript code and need to know what's going on.
What they could do is use the same calls as the ajax enabled script would do, so you can get at the raw data.
But this would mean you need to have a very good understanding of the webpage and which url's it's calling, and is quite labour intensive.
So the answer is: No, as far as I know, they don't exist.
Comments
you can use the phantomjs library to excute js.
https://github.com/ariya/phantomjs/blob/master/examples/waitfor.js