java.io.IOException: Mark has been invalidated when parsing website with jsoup

Question

When trying parse html page of website it crashes with the error:

java.io.IOException:Mark has been invalidated.

Part of my code:

String xml = xxxxxx;
try {
    Document document = Jsoup.connect(xml).maxBodySize(1024*1024*10)
            .timeout(0).ignoreContentType(true)
            .parser(Parser.xmlParser()).get();

    Elements elements = document.body().select("td.hotv_text:eq(0)");

    for (Element element : elements) {
        Element element1 = element.select("a[href].hotv_text").first();
        hashMap.put(element.text(), element1.attr("abs:href"));
    }
} catch (HttpStatusException ex) {
    Log.i("GyWueInetSvc", "Exception while JSoup connect:" + xml +" cause:"+ ex.getMessage());
} catch (IOException e) {
    e.printStackTrace();
    throw new RuntimeException("Socket timeout: " + e.getMessage(), e);
}

The size of website which I want parse is about 2MB. And when I debug code I see that when in java package ConstrainableInputStream.java method:

public void reset() throws IOException {
    super.reset();remaining = maxSize - markpos;
}

and returns markpos= -1 then goes to the exception.

How can I solve that problem?

Hi @NeoFar - what have you tried, and what is the exact wording of the IOException? What you posted is the code that throws the exception, not the exception message itself. Thanks. — Max von Hippel
– Max von Hippel, Commented Dec 9, 2017 at 21:41
Hi @Max von Hippel - I tried parse xml from one link.Exception message is. There is more size for insert all error message text but main is here: at android.os.AsyncTask$SerialExecutor$1.run(AsyncTask.java:231) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587) at java.lang.Thread.run(Thread.java:818) Caused by: java.io.IOException: Mark has been invalidated. — Farid Abbasov
– Farid Abbasov, Commented Dec 9, 2017 at 21:56
Error comes from BufferedInputStream.java where markpos = -1; /* buffer got too big, invalidate mark */ What's that mean? — Farid Abbasov
– Farid Abbasov, Commented Dec 10, 2017 at 6:38

ulong Mask · Accepted Answer · 2019-11-21 10:13:53Z

4

This is helped me:

GET: .execute().bufferUp().parse();
POST: .method(Connection.Method.POST).execute().bufferUp().parse();

answered Nov 21, 2019 at 10:13

ulong Mask

411 silver badge1 bronze badge

Sign up to request clarification or add additional context in comments.

Comments

Shubham Sejpal · Accepted Answer · 2017-12-11 06:20:42Z

2

I found solution of the problem. Problem was in buffer overloading. Solved using below code:

BufferedReader br = null;


try{
       connection =  new URL(xml).openConnection();


       Scanner scanner = new Scanner(connection.getInputStream());


       while (scanner.hasNextLine()) {


             String line = scanner.nextLine();


             content = content +line;
       }

} catch (MalformedURLException e) {


       e.printStackTrace();


} catch (IOException e) {


       e.printStackTrace();



} 
Document document = Jsoup.parse(content);

edited Dec 11, 2017 at 6:20

Shubham Sejpal

3,6242 gold badges16 silver badges31 bronze badges

answered Dec 10, 2017 at 17:26

Farid Abbasov

871 silver badge12 bronze badges

Comments

nDijax · Accepted Answer · 2020-02-28 15:00:04Z

1

I've got the same exception when upgrading to 1.12.2 from 1.11.3 Try downgrade your dependecies

answered Feb 28, 2020 at 15:00

nDijax

5213 silver badges9 bronze badges

Comments

Angel Koh · Accepted Answer · 2020-06-29 17:16:39Z

to add on to @ulong's answer, reguarding the use of bufferUp()

this is recommended in the documentation within the jsoup codes itself if you need to parse the document several times. BufferUp is called before parse, so that the InputStream will not be drained, resulting in an invalid mark error (IOException)

    /**
     * Read and parse the body of the response as a Document. If you intend to parse the same response multiple
     * times, you should {@link #bufferUp()} first.
     * @return a parsed Document
     * @throws IOException on error
     */
    Document parse() throws IOException;

and reguarding bufferUp()

    /**
     * Read the body of the response into a local buffer, so that {@link #parse()} may be called repeatedly on the
     * same connection response (otherwise, once the response is read, its InputStream will have been drained and
     * may not be re-read). Calling {@link #body() } or {@link #bodyAsBytes()} has the same effect.
     * @return this response, for chaining
     * @throws UncheckedIOException if an IO exception occurs during buffering.
     */
    Response bufferUp();

Ovokerie Ogbeta · Accepted Answer · 2018-06-24 18:29:41Z

-1

Use ~.execute().parse(); instead of ~.get(); to get the document and remove the parser thus your code becomes;

Document document = Jsoup.connect(xml).maxBodySize(1024*1024*10)
            .timeout(0).ignoreContentType(true)
            .execute().parse();

this is a temporary fix as we await the new version which will fix the bug

edited Jun 24, 2018 at 18:29

answered Jun 24, 2018 at 12:40

Ovokerie Ogbeta

5137 silver badges5 bronze badges

Collectives™ on Stack Overflow

java.io.IOException: Mark has been invalidated when parsing website with jsoup

5 Answers 5

Comments

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related