How to write this crawler in JavaScript?

Question

The idea is very simple:

Imagine a simple white page with a form with a single input tag (like Google homepage ). When I insert a link of a blog post in this form, then the javascript-crawler search the first image in the web page of the blog post (through ajax), show it in the white page and save it on my server.

This crawler works like Digg and Facebook-wall.

What function I have to use for this crawler?

Darin Dimitrov · Accepted Answer · 2010-09-28 17:50:07Z

3

Due to cross cross domain restrictions pure javascript crawlers are not common and practically feasible. You might need to setup a server side script which will receive the address entered in the form, fetch the contents of the remote resource and parse the html to obtain the images.

answered Sep 28, 2010 at 17:50

Darin Dimitrov

1.0m276 gold badges3.3k silver badges3k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Murat Çorlu · Accepted Answer · 2010-10-17 00:43:16Z

2

Darin is right, javascript cannot request content from another domain. But it can dynamically add script tags to document and includes some scripts from other domains. (detailed information: jsonp)

I can suggest you to use YQL. You can crawl every page that you want with Yahoo's YQL library by coding only Javascript. Yahoo servers fetchs urls that you requested, parses HTML and sends you requested part of documents.

answered Oct 17, 2010 at 0:43

Murat Çorlu

8,6135 gold badges57 silver badges82 bronze badges

Collectives™ on Stack Overflow

How to write this crawler in JavaScript?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related