3

Maybe this is gonna sound naive and all, but is there something even remotely close to a php crawler for ajax based websites?

1
  • It's not PHP, so I'm not offering it as an answer, but HTMLUnit in Java is a fully scriptable headless browser component, complete with JS support - could be used as a crawler, too. Commented May 20, 2011 at 11:02

3 Answers 3

2

The problem is that vanilla PHP doesn't understand how to parse JavaScript, generate the JavaScript environment, and interact with everything. In order to theoretically do it, you would have to extend PHP via the C API and interface it with a JavaScript library. The scale of this is quite large depending on how many resources you have.

Sign up to request clarification or add additional context in comments.

Comments

2

Not automatic crawlers, because they would need to understand the javascript code and need to know what's going on.

What they could do is use the same calls as the ajax enabled script would do, so you can get at the raw data.

But this would mean you need to have a very good understanding of the webpage and which url's it's calling, and is quite labour intensive.

So the answer is: No, as far as I know, they don't exist.

Comments

0

you can use the phantomjs library to excute js.

https://github.com/ariya/phantomjs/blob/master/examples/waitfor.js

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.