0

I'm trying to read out the code of a website. But there is an issue if I want to receive the code of this site for example: "https://www.amazon.de/gp/bestsellers/pet-supplies/#2" I tried a lot, but still im just receiving the code of https://www.amazon.de/gp/bestsellers/pet-supplies". So something does not work right as I want to receive place 21-40 and not 1-20. I'm using an URLConneciton and a BufferedReader:

public String fetchPage(String urlS){       
    String s = null;
    String qc = null;

    try{
    URL url = new URL(urlS);
    URLConnection uc = url.openConnection();
    uc.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0");

    BufferedReader reader = new BufferedReader(new InputStreamReader(uc.getInputStream()));


    while((s = reader.readLine()) != null){
        qc += s;
    }
    reader.close();
    } catch(IOException e) {            
        e.printStackTrace();
        qc = "receiving qc failed";
    }
    return qc;
}

Thank you in advance for your effort :)

1 Answer 1

1

The URL you're fetching, contains an achor (the #2 at the end). An anchor is a client-side concept and is originally used to jump to a certain part of the page. Some webapps (mostly single-page apps) use the anchor to keep track of some sort of state (eg. what page of products you're viewing).

Since the anchor is a client side concept, the responding webserver (or your browser/HTTP client library) just drops any anchors as if you actually requested https://www.amazon.de/gp/bestsellers/pet-supplies.

Bottom line is that you'll never get the second page... Goog luck in scraping Amazon though ;)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.