2

I've read some articles on advantages of using SAX parser for parsing XML files in java over using DOM. The one which appeals me the most (as discussed here) is that

Sax is suitable for large XML File and The SAX parser does not loads the XML file as a whole in the memory.

But now as i've written a parser using SAX to derive the entities out of an XML file for a large file of almost 1.4 GB it generates the following Exception.

org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; The parser has encountered more than "64,000" entity expansions in this document; this is the limit imposed by the application.

What is the problem with the memory if the file as whole is not loaded in the memory.

How can i resolve the issue?

4
  • 3
    That is not a necessarily an actual memory limitation, but a protective measure against DOS-attacks like this one. If your input XML legally contains that many entities you can increase that limit in your parser. Look at its documentation. Commented Apr 2, 2015 at 19:25
  • what to you suggest me to do with this protective measure? Commented Apr 2, 2015 at 19:31
  • I thought I said that. Commented Apr 2, 2015 at 19:38
  • Should i look at the documentation of JVM? Commented Apr 2, 2015 at 19:40

2 Answers 2

3

Change the entity expansion limit with a JVM parameter:

-DentityExpansionLimit=1000000
Sign up to request clarification or add additional context in comments.

5 Comments

Depends on how you are running your program. It's a command-line parameter.
this post stackoverflow.com/questions/29360901/… contains my code for the parser hope you understand how i'm dealing with it
Yes, but how are you RUNNING it. Are you typing java blah blah from the command prompt? Are you executing it via an IDE?
Under run configurations on the arguments tab it's called "VM arguments". That's where you want to add that.
Thank you so much that really worked. :) I'm really really great full to you.
0

You can also think about using StAX.

SAX is event driven and serial. It can handle large XML, but takes a lot of CPU resources.

DOM is taking the complete document in memory.

StAX is a more recent API. It is streaming over the XML. It can be seen as a cursor or iterator over the document. It has the advantage you can skip elements that you don't need (attributes, tags, ...). It is taking a lot less CPU resources if used properly.

https://docs.oracle.com/javase/tutorial/jaxp/stax/why.html

With SAX, the XML push the events.

With StAX, you pull the XML to you.

10 Comments

Does this means all my efforts to create a parser (using SAX) that actually worked well for the files of smaller size is wasted?
No. You can stick to SAX if you have fixed your issue. I just wanted to inform you there is still another, modern way of parsing XML. Another advantage: with SAX u can only parse XML, with StAX u can also write XML.
And if you have written your SAX implementation with well chosen methods, maybe you can reuse a lot of code and try the StAX way to measure the difference in performance. U will be surprised, believe me: when used correctly and skipping unnecessary elements, your parse time will decrease drastically!
in the comment to an answer below, i have added a link for my code. You see that.
It's just a proposal! I can provide you a StAX snippet if you want. It is typically used in a certain pattern. I'll look it up and will edit my post with a small example.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.