1

The idea is very simple:

Imagine a simple white page with a form with a single input tag (like Google homepage ). When I insert a link of a blog post in this form, then the javascript-crawler search the first image in the web page of the blog post (through ajax), show it in the white page and save it on my server.

This crawler works like Digg and Facebook-wall.

What function I have to use for this crawler?

2 Answers 2

3

Due to cross cross domain restrictions pure javascript crawlers are not common and practically feasible. You might need to setup a server side script which will receive the address entered in the form, fetch the contents of the remote resource and parse the html to obtain the images.

Sign up to request clarification or add additional context in comments.

Comments

2

Darin is right, javascript cannot request content from another domain. But it can dynamically add script tags to document and includes some scripts from other domains. (detailed information: jsonp)

I can suggest you to use YQL. You can crawl every page that you want with Yahoo's YQL library by coding only Javascript. Yahoo servers fetchs urls that you requested, parses HTML and sends you requested part of documents.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.