1

When trying parse html page of website it crashes with the error:

java.io.IOException:Mark has been invalidated.

Part of my code:

String xml = xxxxxx;
try {
    Document document = Jsoup.connect(xml).maxBodySize(1024*1024*10)
            .timeout(0).ignoreContentType(true)
            .parser(Parser.xmlParser()).get();

    Elements elements = document.body().select("td.hotv_text:eq(0)");

    for (Element element : elements) {
        Element element1 = element.select("a[href].hotv_text").first();
        hashMap.put(element.text(), element1.attr("abs:href"));
    }
} catch (HttpStatusException ex) {
    Log.i("GyWueInetSvc", "Exception while JSoup connect:" + xml +" cause:"+ ex.getMessage());
} catch (IOException e) {
    e.printStackTrace();
    throw new RuntimeException("Socket timeout: " + e.getMessage(), e);
}

The size of website which I want parse is about 2MB. And when I debug code I see that when in java package ConstrainableInputStream.java method:

public void reset() throws IOException {
    super.reset();remaining = maxSize - markpos;
} 

and returns markpos= -1 then goes to the exception.

How can I solve that problem?

3
  • Hi @NeoFar - what have you tried, and what is the exact wording of the IOException? What you posted is the code that throws the exception, not the exception message itself. Thanks. Commented Dec 9, 2017 at 21:41
  • Hi @Max von Hippel - I tried parse xml from one link.Exception message is. There is more size for insert all error message text but main is here: at android.os.AsyncTask$SerialExecutor$1.run(AsyncTask.java:231) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587) at java.lang.Thread.run(Thread.java:818) Caused by: java.io.IOException: Mark has been invalidated. Commented Dec 9, 2017 at 21:56
  • Error comes from BufferedInputStream.java where markpos = -1; /* buffer got too big, invalidate mark */ What's that mean? Commented Dec 10, 2017 at 6:38

5 Answers 5

4

This is helped me:

GET: .execute().bufferUp().parse();
POST: .method(Connection.Method.POST).execute().bufferUp().parse();
Sign up to request clarification or add additional context in comments.

Comments

2

I found solution of the problem. Problem was in buffer overloading. Solved using below code:

BufferedReader br = null;


try{
       connection =  new URL(xml).openConnection();


       Scanner scanner = new Scanner(connection.getInputStream());


       while (scanner.hasNextLine()) {


             String line = scanner.nextLine();


             content = content +line;
       }

} catch (MalformedURLException e) {


       e.printStackTrace();


} catch (IOException e) {


       e.printStackTrace();



} 
Document document = Jsoup.parse(content);

Comments

1

I've got the same exception when upgrading to 1.12.2 from 1.11.3 Try downgrade your dependecies

Comments

0

to add on to @ulong's answer, reguarding the use of bufferUp()

this is recommended in the documentation within the jsoup codes itself if you need to parse the document several times. BufferUp is called before parse, so that the InputStream will not be drained, resulting in an invalid mark error (IOException)

    /**
     * Read and parse the body of the response as a Document. If you intend to parse the same response multiple
     * times, you should {@link #bufferUp()} first.
     * @return a parsed Document
     * @throws IOException on error
     */
    Document parse() throws IOException;

and reguarding bufferUp()

    /**
     * Read the body of the response into a local buffer, so that {@link #parse()} may be called repeatedly on the
     * same connection response (otherwise, once the response is read, its InputStream will have been drained and
     * may not be re-read). Calling {@link #body() } or {@link #bodyAsBytes()} has the same effect.
     * @return this response, for chaining
     * @throws UncheckedIOException if an IO exception occurs during buffering.
     */
    Response bufferUp();

Comments

-1

Use ~.execute().parse(); instead of ~.get(); to get the document and remove the parser thus your code becomes;

Document document = Jsoup.connect(xml).maxBodySize(1024*1024*10)
            .timeout(0).ignoreContentType(true)
            .execute().parse();  

this is a temporary fix as we await the new version which will fix the bug

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.