7

I'm creating an application which will enable me to fetch values from a specific website to the console. The value is from a <span> element and I'm using JSoup.

My challenge has to do with this error:

Error fetching URL

Here is my Java code:

public class TestSl {
    public static void main(String[] args) throws IOException {
        Document doc = Jsoup.connect("https://stackoverflow.com/questions/11970938/java-html-parser-to-extract-specific-data").get();
        Elements spans = doc.select("span[class=hidden-text]");
        for (Element span: spans) {
            System.out.println(span.text());
        }
    }
}

And here is the error on Console:

Exception in thread "main" org.jsoup.HttpStatusException: HTTP error fetching URL. Status=403, URL=Java Html parser to extract specific data? at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:590) at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:540) at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:227) at org.jsoup.helper.HttpConnection.get(HttpConnection.java:216) at TestSl.main(TestSl.java:19)

What am I doing wrong and how can I resolve it?

4
  • 1
    The 403 Forbidden error is an HTTP status code which means that accessing the page or resource you were trying to reach is absolutely forbidden for some reason. Commented Apr 21, 2016 at 20:43
  • So in basic, there is no way i could fetch that data? maybe using some alternatives? Or is it that the server/Website does not allow any HTML Phrasers to fetch the data? Commented Apr 21, 2016 at 20:46
  • 1
    Not sure if the website allows you to use HTML parsers.. But most likely the HTML parser works off of port 443 or 80 so I don't think that would be the case. Might be the way you are implementing the code.... Commented Apr 21, 2016 at 20:51
  • Thank you. I have one more issue. So i tried with google (again, span and its class name). I do not get the error but there is no result on my console. I have re-read my code enough times but i could not figure out where i went wrong. Any suggestions for that? Commented Apr 21, 2016 at 20:55

1 Answer 1

11

Set the user-agent header:

.userAgent("Mozilla")

Example:

Document document = Jsoup.connect("https://stackoverflow.com/questions/11970938/java-html-parser-to-extract-specific-data").userAgent("Mozilla").get();
Elements elements = document.select("span.hidden-text");
for (Element element : elements) {
  System.out.println(element.text());
}

Stack Exchange

Inbox

Reputation and Badges

source: https://stackoverflow.com/a/7523425/1048340


Perhaps this is related: https://meta.stackexchange.com/questions/277369/a-terms-of-service-update-restricting-companies-that-scrape-your-profile-informa

Sign up to request clarification or add additional context in comments.

9 Comments

Thanks. Finally worked. Could you elaborate please. Where did i go wrong?
Well i am having one more issue :/ The stackoverflow example works great. But i have another website which i am not getting any results. I do not get the error anymore but no values are spitted on to console. binary.com/trading?l=EN In that page there is this span where it stores Numeric values. Right next to the small graph. The class changes as the value goes up and down. now there is an ID called "spot". I used both class name and ID on my code but i get no results on my console. Could you suggest any reason why?
Perhaps StackOverflow is sniffing the user-agent. I know they are actively trying to prevent web scraping abuse at the moment. Here is some good advice: learn.scrapehero.com/…
If you have another question/problem I would suggest providing more details and creating a new post. :)
Yeah i get that. I mean web scrapping could lead to a huge misunderstanding. But could you suggest why i get data from stackoverflow and not from the binary website? Is there any legit reason for that? Or is their server denying access? And that is the best i could do with explanation :p I mean i get the values from stackoverflow example but i do not get any values from the Binary website even when i use class name or ID.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.