1

I need SAX parsing because I want to check for maliciously malformed XML. It's the first time I'm using this library.

I created an XML file (18MB) which contains an attribute with a very, very long name.

    <?xml version="1.0"?>
    <company>
        <staff>
            <firstname VERYLONGATTRIBUTENAME...VERYLONGATTRIBUTENAME="some value"> 
yong</firstname>
        <lastname>mook kim</lastname>
        <nickname>mkyong</nickname>
        <salary>100000</salary>
    </staff>
    <staff>
        <firstname>low</firstname>
        <lastname>yin fong</lastname>
        <nickname>fong fong</nickname>
        <salary>200000</salary>
    </staff>
</company>

I just call the SAXParser like this

saxParser.parse("test.xml", handler);

All of the event handlers are completely empty. But an OutOfMemoryError: Java heap space occurs. Why does this happen? I choose SAX because it was stream/event based and wouldn't have problems handeling this type of problems (compared to DOM).

EDIT: I increased the length of attribute name by doubling it every time. It worked until I reached this 18MB file.

EDIT 2: Stack trace

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:2882)
    at java.lang.StringValue.from(StringValue.java:24)
    at java.lang.String.<init>(String.java:178)
    at com.sun.org.apache.xerces.internal.util.SymbolTable$Entry.<init>(SymbolTable.java:338)
    at com.sun.org.apache.xerces.internal.util.SymbolTable.addSymbol(SymbolTable.java:178)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.scanName(XMLEntityScanner.java:726)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanAttribute(XMLDocumentFragmentScannerImpl.java:1523)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanStartElement(XMLDocumentFragmentScannerImpl.java:1320)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2756)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:647)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
    at javax.xml.parsers.SAXParser.parse(SAXParser.java:395)
    at javax.xml.parsers.SAXParser.parse(SAXParser.java:277)
    at com.thundercloud.httpfilter.XMLParser.test(XMLParser.java:150)
    at com.thundercloud.httpfilter.HTTPInterceptor.main(HTTPInterceptor.java:34)

Thanks in advance

10
  • 2
    What is the heap size? Is it possible to increase it? Commented Feb 26, 2013 at 6:57
  • I don't know. I am using Eclipse Juno, so it must be on the default value? Also, wouldn't this be like a bogus solution, since I would possibly be parsing the same files but at 180 or 1800 MB, so that might again induce the same error no? Commented Feb 26, 2013 at 7:12
  • @Thomas it depends. It would be hard to parse 2GiB file in-memory with 20MiB of heap space. At least you should check your heap size to know whether memory usage is abnormal or not. Commented Feb 26, 2013 at 7:18
  • @defaultlocale I just checked "View Heap Status" in Preferences > General. The status bar on the bottom reads "Heap Size: 432M of 506M" Commented Feb 26, 2013 at 7:25
  • I change the Heap size with the argument "-Xmx1024m". The memory error does not occur anymore. Do you think it would just reappear if keep on increasing the attribute length for several MB's? Btw, the heap size status bar kept on saying 506M, so that must of been a wrong indicator Commented Feb 26, 2013 at 7:38

3 Answers 3

1

You can find your memory settings in Eclipse Run->Run Configuration. Look for Java application and find the name of the class you try to run, select it, click the Arguments tab. What is the setting in the VM Arguments section? If it is empty, please add the below value to the to the VM Arguments section.

-Xms512M -Xmx1024M

Also, there is a bug relating to JDK6 regarding SAX parser throws OutOfMemoryError. The affected version is JDK6 before update 14. Please check your Java version to make sure it does not apply to you.

Edit: based on the comment, I modify my answer and suggest to add the below VM setting in the VM arguments section

-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath="c:\temp\oomdump.hprof".

Then you can use tools like Eclipse MAT http://www.eclipse.org/mat/ to analyze the dump file to see what is really the issue

Sign up to request clarification or add additional context in comments.

2 Comments

No, that wasn't it. Upgraded to JDK 1.7.0 just to be sure, but no difference. Thanks anyway!
I modified my answer based on your comments and suggest different VM setting and MAT memory analyzer tool.
0

First of all, I don't think any attribute name will be that long. Try increasing the heap size, and then check.

java -jar -Xms<min_size> -Xmx<max_size> <ur_jar>

1 Comment

Well, you detected it, didn't you?
0

You may want to check out ScaleDOM, which allows to parse very large XML files: https://github.com/whummer/scaleDOM

ScaleDOM has a small memory footprint due to lazy loading of XML nodes. It only keeps a portion of the XML document in memory and re-loads nodes from the source file when necessary.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.