Exception for no memory while parsing large XML file in SAX parser

Question

I've read some articles on advantages of using SAX parser for parsing XML files in java over using DOM. The one which appeals me the most (as discussed here) is that

Sax is suitable for large XML File and The SAX parser does not loads the XML file as a whole in the memory.

But now as i've written a parser using SAX to derive the entities out of an XML file for a large file of almost 1.4 GB it generates the following Exception.

org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; The parser has encountered more than "64,000" entity expansions in this document; this is the limit imposed by the application.

What is the problem with the memory if the file as whole is not loaded in the memory.

How can i resolve the issue?

That is not a necessarily an actual memory limitation, but a protective measure against DOS-attacks like this one. If your input XML legally contains that many entities you can increase that limit in your parser. Look at its documentation. — Tomalak
– Tomalak, Commented Apr 2, 2015 at 19:25

Necreaux · Accepted Answer · 2015-04-02 19:24:00Z

3

Change the entity expansion limit with a JVM parameter:

-DentityExpansionLimit=1000000

answered Apr 2, 2015 at 19:24

Necreaux

9,8248 gold badges29 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Necreaux Over a year ago

Depends on how you are running your program. It's a command-line parameter.

Subhan Over a year ago

this post stackoverflow.com/questions/29360901/… contains my code for the parser hope you understand how i'm dealing with it

Necreaux Over a year ago

Yes, but how are you RUNNING it. Are you typing java blah blah from the command prompt? Are you executing it via an IDE?

Necreaux Over a year ago

Under run configurations on the arguments tab it's called "VM arguments". That's where you want to add that.

Subhan Over a year ago

Thank you so much that really worked. :) I'm really really great full to you.

Stefaan Neyts · Accepted Answer · 2015-04-02 19:32:49Z

0

You can also think about using StAX.

SAX is event driven and serial. It can handle large XML, but takes a lot of CPU resources.

DOM is taking the complete document in memory.

StAX is a more recent API. It is streaming over the XML. It can be seen as a cursor or iterator over the document. It has the advantage you can skip elements that you don't need (attributes, tags, ...). It is taking a lot less CPU resources if used properly.

https://docs.oracle.com/javase/tutorial/jaxp/stax/why.html

With SAX, the XML push the events.

With StAX, you pull the XML to you.

answered Apr 2, 2015 at 19:32

Stefaan Neyts

2,0772 gold badges18 silver badges26 bronze badges

10 Comments

Subhan Over a year ago

Does this means all my efforts to create a parser (using SAX) that actually worked well for the files of smaller size is wasted?

Stefaan Neyts Over a year ago

No. You can stick to SAX if you have fixed your issue. I just wanted to inform you there is still another, modern way of parsing XML. Another advantage: with SAX u can only parse XML, with StAX u can also write XML.

Stefaan Neyts Over a year ago

And if you have written your SAX implementation with well chosen methods, maybe you can reuse a lot of code and try the StAX way to measure the difference in performance. U will be surprised, believe me: when used correctly and skipping unnecessary elements, your parse time will decrease drastically!

Subhan Over a year ago

in the comment to an answer below, i have added a link for my code. You see that.

Stefaan Neyts Over a year ago

It's just a proposal! I can provide you a StAX snippet if you want. It is typically used in a certain pattern. I'll look it up and will edit my post with a small example.

|

Collectives™ on Stack Overflow

Exception for no memory while parsing large XML file in SAX parser

2 Answers 2

5 Comments

10 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

10 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related