2

I’m trying to parse a XML file up to 500 mb in java. I tried to use SAX but it gives me this error java.lang.OutOfMemoryError: Java heap space at com.sun.org.apache.xerces.internal.util.XMLStringBuffer.append(Unknown Source) Can you help me? Thanks a lot. P.S. Smaller XML files works just fine

3
  • I'm also interested about what are you storing in a 500MB xml file !? Commented Feb 2, 2009 at 20:24
  • Does your XML contain very large (10s of millions of characters, say) runs of text without intervening elements? Commented Feb 2, 2009 at 20:39
  • I don't know why...is a school project. The file is here dblp.uni-trier.de/xml. Commented Feb 3, 2009 at 21:04

7 Answers 7

12

Most likely you're not using SAX correctly, or your application isn't suited for stream processing.

The whole point of SAX is to avoid keeping the entire XML structure in memory, but that's only possible if you can process the XML in small chunks without keeping much context, and if the result of the processing either is much smaller than the processed XML (so that it does not use too much memory either) or can itself be passed on to a recipient or written to disk continuously.

Edit: It's also possible that you simply have a memory leak, i.e. you're holding on to data that you don't need anymore , preventing it from getting garbage collected. If you use any Lists, Maps or Sets for processing the XML, make sure that anything you add to them while processing one chunk of XML is removed before you start the next chunk.

Sign up to request clarification or add additional context in comments.

1 Comment

AbsolutellY correct. I could parse more then 2GB of XML file with SAX.
5

try using Streaming API for XML (new in java6) its made for doing this

http://www.javabeat.net/articles/14-java-60-features-part-2-pluggable-annotation-proce-2.html

Comments

3

You can try to increase the Java heap size by specifying e.g.

java -Xmx1024M MyClass

on the command line (or what ever value will suit your document size).

Comments

2

StAX for Java versions pre-6: http://stax.codehaus.org/

Comments

1

You may want to check out ScaleDOM, which allows to parse very large XML files: https://github.com/whummer/scaleDOM

ScaleDOM has a small memory footprint due to lazy loading of XML nodes. It only keeps a portion of the XML document in memory and re-loads nodes from the source file when necessary.

Comments

1

Say you have the following XML structure:

<?xml version="1.0"?>
<list>
    <item>
        <name>Alpha</name>
        <age>10</age>
    </item>
    <item>
        <name>Beta</name>
        <age>20</age>
    </item>
    <!-- many many items -->
</list>

And you want to get all the <item>s

public class Item
{
    String name;
    String age;
}

Your SAX handler will look like this

public class MyHandler extends DefaultHandler
{
    Item current=null;
    StringBuilder content=null;
    
    @Override
    public void startElement(String uri, String localName, String name, Attributes attributes) throws SAXException 
    {
        if(name.equals("item"))
        {
            current= new Item();
        }
        else if(name.equals("name") || name.equals("age"))
        {
            content= new StringBuilder();
        }
    }

    @Override
    public void endElement(String uri, String localName, String name) throws SAXException
    {
        if(name.equals("item"))
        {
        //DO SOMETHING WITH current
        System.out.println(current);
        current=null;
        }
        else if(name.equals("name"))
        {
        current.name= content.toString();
        }
        else if(name.equals("age"))
        {
        current.age= content.toString();
        }
        content=null;
    }

    @Override
    public void characters(char[] ch, int start, int length) throws SAXException 
    {
        if(content!=null)
        {
            content.append(ch,start,length);
        }
    }
}

As you can see, the content is only memorized between the age and name tags.

Comments

0

Take a look at Apache Digester.

Here is a small tutorial

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.