java web page reader

Question

I want to retrieve all the links in web page ,but the web page use javascript and each page contain number of links

how can i go to the next page and read its contain in java program

Do you need to crawl the web (thousands/millions of sites) or just crawl a handful of sites? — Joel
– Joel, Commented Dec 14, 2010 at 9:23
thanks every on i have read the links of first page but i want idea to get the links of the next page, because it have the same URL the only different is the contain — asas
– asas, Commented Dec 14, 2010 at 9:54

PeterMmm · Accepted Answer · 2010-12-14 09:08:34Z

1

Getting this info from a Javascript'ed page can be a hard job. Your program must interpret the whole page and understand what the JS is doing. Not all web spiders doing this.

Most modern JS libraries (jquery, etc) are mostly manipulate CSS and attributes of HTML elements. So first you have to generate the "flat" HTML from HTML source and JS and then maybe run a classical web spider over the flat HTML code.

(For example the FF webdeveloper plugin allows to see the original source code of a page and the generated code of the page, when all JS is done).

answered Dec 14, 2010 at 9:08

PeterMmm

24.7k15 gold badges77 silver badges115 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Joel Over a year ago

Agreed. In many instances the only way to do this properly, on some JS heavy sites is to render the page via a 'headless browser'.

ukanth · Accepted Answer · 2010-12-14 08:50:40Z

1

What you are looking for is called Web Spider engine. There are plenty of open source web spider engine's are available. Check http://j-spider.sourceforge.net/ for example

answered Dec 14, 2010 at 8:50

ukanth

3,0085 gold badges32 silver badges38 bronze badges

3 Comments

Joel Over a year ago

Does it extract dynamic JS links?

ukanth Over a year ago

@Joel, Not sure about dynamic JS links? could you explain little brief ?

Joel Over a year ago

I got the impression the OP wanted to crawl a site with dynamically generated (JS) links...

Collectives™ on Stack Overflow

java web page reader

2 Answers 2

1 Comment

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related