open and save web page selenium java

Question

I need to get the content of some web pases like "http://www.ncbi.nlm.nih.gov/nuccore/NM_007002" for my project. The problem is that I need to open the page from a browser and save it to get the full content (if I try to use the libraries URL and BufferReader I get the "frame" of the page but not the text I need). My professor told me to use Seleniume to open and download the pages I need and then read and parse the relevant information.

Unfortunately, I can't find an example from a JAVA code that open and save a web page. Can anyone explane to my how to do this?

I want to SAVE the page to my computer, not copy the source and save it for file. Not all of the information appears in the source! It's hidden.

Save a webpage? With HTML tags or only the text of the web page? — Ant's
– Ant's, Commented Jan 5, 2015 at 11:35
possible duplicate of How to save current page source in different name & folder — Louis
– Louis, Commented Jan 5, 2015 at 11:44

Sariq Shaikh · Accepted Answer · 2016-08-18 09:08:55Z

3

In Selenium you can do this:

SafariDriver driver = new SafariDriver(); //you can use any drivers like Chrome,FireFox
driver.get("your link");
String pageSource = driver.getPageSource(); //now you have the page source
//you can save the pageSource to the file or do what ever you want.

Look at the getPageSource docs here.

If you want to get data from the specific tags, like say for example body, then you can do this:

String pageSource=driver.findElement(By.tagName("body")).getText();

edited Aug 18, 2016 at 9:08

Sariq Shaikh

1,1249 silver badges30 bronze badges

answered Jan 5, 2015 at 11:38

Ant's

13.9k30 gold badges103 silver badges148 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Artemis Over a year ago

This is not what I need. I need to save the page to my computer. Only then the information I need is avilable.

Ant's Over a year ago

@yalush: You want to save the page to computer then why can't you do that with File?

Artemis Over a year ago

Because File save the text of the page and I need the page itself, just like when I use "save as...". I need it because some of the information in the page is hidden, and appears in the sourse only when I save the page to my computer.

Ant's Over a year ago

You want along with the images etc those are present in the Web page?

Artemis Over a year ago

I only need the text in the middle of the page (the one with the information about the gene and exons)

Master Slave · Accepted Answer · 2015-01-05 11:44:32Z

1

Keep in mind that Selenium is meant for web page automation, so for interacting with the pages automatically. If only the source is really what you need, you can use a JSoup a really solid Java Html parser, in two lines of code, you should have your source

     try {
            Document doc = Jsoup.connect("http://www.ncbi.nlm.nih.gov/nuccore/NM_007002").userAgent("Mozilla/5.0").timeout(30000).get();
            System.out.println(doc.toString());
        } catch (IOException e) {
            e.printStackTrace();
        }

answered Jan 5, 2015 at 11:44

Master Slave

28.6k4 gold badges61 silver badges56 bronze badges

1 Comment

Artemis Over a year ago

You can open the page sourse and see the problam for yourself. You can see that the word "exon" appears many time in the page, but only one in the sourse. If I try to read the sourse I can't get all the informetion I need.

Collectives™ on Stack Overflow

open and save web page selenium java

2 Answers 2

5 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related