0

I want to retrieve all the links in web page ,but the web page use javascript and each page contain number of links

how can i go to the next page and read its contain in java program

2
  • Do you need to crawl the web (thousands/millions of sites) or just crawl a handful of sites? Commented Dec 14, 2010 at 9:23
  • thanks every on i have read the links of first page but i want idea to get the links of the next page, because it have the same URL the only different is the contain Commented Dec 14, 2010 at 9:54

2 Answers 2

1

Getting this info from a Javascript'ed page can be a hard job. Your program must interpret the whole page and understand what the JS is doing. Not all web spiders doing this.

Most modern JS libraries (jquery, etc) are mostly manipulate CSS and attributes of HTML elements. So first you have to generate the "flat" HTML from HTML source and JS and then maybe run a classical web spider over the flat HTML code.

(For example the FF webdeveloper plugin allows to see the original source code of a page and the generated code of the page, when all JS is done).

Sign up to request clarification or add additional context in comments.

1 Comment

Agreed. In many instances the only way to do this properly, on some JS heavy sites is to render the page via a 'headless browser'.
1

What you are looking for is called Web Spider engine. There are plenty of open source web spider engine's are available. Check http://j-spider.sourceforge.net/ for example

3 Comments

Does it extract dynamic JS links?
@Joel, Not sure about dynamic JS links? could you explain little brief ?
I got the impression the OP wanted to crawl a site with dynamically generated (JS) links...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.