How to fetch HTML in Java

Question

Without the use of any external library, what is the simplest way to fetch a website's HTML content into a String?

possible duplicate of stackoverflow.com/questions/238547/… — jjnguy
– jjnguy, Commented Apr 6, 2010 at 5:29

Community · Accepted Answer · 2018-12-29 11:17:20Z

48

I'm currently using this:

String content = null;
URLConnection connection = null;
try {
  connection =  new URL("http://www.google.com").openConnection();
  Scanner scanner = new Scanner(connection.getInputStream());
  scanner.useDelimiter("\\Z");
  content = scanner.next();
  scanner.close();
}catch ( Exception ex ) {
    ex.printStackTrace();
}
System.out.println(content);

But not sure if there's a better way.

edited Dec 29, 2018 at 11:17

CommunityBot

11 silver badge

answered Aug 28, 2008 at 1:21

pek

18.1k28 gold badges89 silver badges100 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

greenoldman Over a year ago

Why "\\Z"? Isn't it an EOF on Windows only? I am just guessing here.

Max Husiv Over a year ago

Why do you use "\\Z"? What does it do? I tried without it, it didn't work.

Chris A Over a year ago

@MaxHusiv I think it's because if you don't specify a delimiter, scanner.next() will just go through the whole HTML character by character, but if you use a delimiter which won't be found in the HTML, scanner.next() returns the whole thing.

theerrormagnet Over a year ago

What import statements do you need for that to work?

Scott Bennett-McLeish · Accepted Answer · 2008-08-29 05:11:10Z

23

This has worked well for me:

URL url = new URL(theURL);
InputStream is = url.openStream();
int ptr = 0;
StringBuffer buffer = new StringBuffer();
while ((ptr = is.read()) != -1) {
    buffer.append((char)ptr);
}

Not sure at to whether the other solution(s) provided are any more efficient or not.

answered Aug 29, 2008 at 5:11

Scott Bennett-McLeish

9,29311 gold badges43 silver badges47 bronze badges

5 Comments

Seun Osewa Over a year ago

Don't you need to include the following? import java.io.* import java.net.*

Scott Bennett-McLeish Over a year ago

Sure, but they're core java so very simple. As for the actual code, the import statements are omitted for clarity.

rupinderjeet Over a year ago

after while, you should display the buffer's content too! or write a method where you read it!

Aaron Esau Over a year ago

be sure to close the inputstream

Line Over a year ago

why have you named the variable ptr?

Community · Accepted Answer · 2017-05-23 12:18:27Z

2

I just left this post in your other thread, though what you have above might work as well. I don't think either would be any easier than the other. The Apache packages can be accessed by just using import org.apache.commons.HttpClient at the top of your code.

Edit: Forgot the link ;)

edited May 23, 2017 at 12:18

CommunityBot

11 silver badge

answered Aug 28, 2008 at 1:31

Justin Bennett

9,1792 gold badges29 silver badges30 bronze badges

1 Comment

Seun Osewa Over a year ago

Apparently you also have to install the JAR file :)

Scott Bennett-McLeish · Accepted Answer · 2013-03-05 09:16:53Z

2

Whilst not vanilla-Java, I'll offer up a simpler solution. Use Groovy ;-)

String siteContent = new URL("http://www.google.com").text

answered Mar 5, 2013 at 9:16

Scott Bennett-McLeish

9,29311 gold badges43 silver badges47 bronze badges

Comments

piero B · Accepted Answer · 2024-10-17 17:06:12Z

Well it depends on what you're expecting to do with the fetched html string. If your goal is to do some kind of parsing or any kind of data extracting from the html content, why refrain yourself from using an external library?

Jsoup does the whole job very well without having to write a single regex yourself.

For example, to get the page title ( <head><title>this one</title>... ) you only need these few lines of code:

 String url = "https://www.example.com";
 Document document = Jsoup.connect(url).get();
 String title = document.title();

To use Jsoup you just have to add the dependency to your pom.xml file (make sure to pick the right version for the JDK your running):

    <dependency>
        <groupId>org.jsoup</groupId>
        <artifactId>jsoup</artifactId>
        <version>1.18.1</version>
    </dependency>

With the Jsoup document created on the second line of the above example, you can access any DOM element with css like selectors. For instance, this will print the URLs of every image in the page:

  document.select("img")
    .forEach(element -> System.out.println(element.attr("src")));

You can access the raw html string if you really need to: String rawHtml = document.html();

I am also often tempted not to use any external library, but I am very glad I did it for this one. Straight forward, simple to use and very comprehensive.

Dheeraj Mukharjee · Accepted Answer · 2023-07-03 06:49:58Z

0

 try {
        URL u = new URL("https"+':'+'/'+'/'+"www.Samsung.com"+'/'+"in"+'/');
        URLConnection urlconnect = u.openConnection();
        InputStream stream = urlconnect.getInputStream();
        int i;
        while ((i = stream.read()) != -1) {
            System.out.print((char)i);
        }
    }
    catch (Exception e) {
        System.out.println(e);
    }

answered Jul 3, 2023 at 6:49

Dheeraj Mukharjee

11 bronze badge

Comments

dinesh kandpal · Accepted Answer · 2018-07-14 10:57:56Z

-4

Its not library but a tool named curl generally installed in most of the servers or you can easily install in ubuntu by

sudo apt install curl

Then fetch any html page and store it to your local file like an example

curl https://www.facebook.com/ > fb.html

You will get the home page html.You can run it in your browser as well.

answered Jul 14, 2018 at 10:57

dinesh kandpal

7858 silver badges16 bronze badges

1 Comment

user9016207 Over a year ago

Squints eyes to show shock. This is a Java Question.

Collectives™ on Stack Overflow

How to fetch HTML in Java

7 Answers 7

4 Comments

5 Comments

1 Comment

Comments

Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

4 Comments

5 Comments

1 Comment

Comments

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related