1

I have a fairly large XML file (~280 MB) and each row in the XML file has many attributes, I want to extract 3 attributes from it and store it somewhere. But I ran out of memory when I do that. My code looks like this:

File xmlFile = new File(xml);
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = null;
try {
    doc = dBuilder.parse(xmlFile);
} catch (SAXException e) {
    e.printStackTrace();
} catch (IOException e) {
    e.printStackTrace();
}

NodeList nList = doc.getElementsByTagName("row");
for (int index = 0; index < nList.getLength(); index++) {
    Node nNode = nList.item(index);
    if (nNode.getNodeType() == Node.ELEMENT_NODE) {
        System.out.print("F1 : " + 
            nNode.getAttributes().getNamedItem("F1").getTextContent());
        System.out.print(" F2: " + 
            nNode.getAttributes().getNamedItem("F2").getTextContent());
        System.out.println(" F3: " + 
            nNode.getAttributes().getNamedItem("F3").getTextContent());
    }
}

This is the error I get:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at com.sun.org.apache.xerces.internal.dom.DeferredDocumentImpl.getNodeObject(DeferredDocumentImpl.java:974)
    at com.sun.org.apache.xerces.internal.dom.DeferredElementImpl.synchronizeData(DeferredElementImpl.java:121)
    at com.sun.org.apache.xerces.internal.dom.ElementImpl.getTagName(ElementImpl.java:314)
    at com.sun.org.apache.xerces.internal.dom.DeepNodeListImpl.nextMatchingElementAfter(DeepNodeListImpl.java:199)
    at com.sun.org.apache.xerces.internal.dom.DeepNodeListImpl.item(DeepNodeListImpl.java:146)
    at com.sun.org.apache.xerces.internal.dom.DeepNodeListImpl.getLength(DeepNodeListImpl.java:117)
    at Parser.parsePosts(Parser.java:55)
    at Parser.main(Parser.java:72)

How do I change it to prevent going over too much space?

EDIT: Wrote a new parser using SAX, seems to get the job done. The code is:

try {

        SAXParserFactory factory = SAXParserFactory.newInstance();
        SAXParser saxParser = factory.newSAXParser();

        DefaultHandler handler = new DefaultHandler() {
            public void startElement(String uri, String localName,String qName, 
                    Attributes attributes) throws SAXException {
                System.out.print(attributes.getValue("F1") + " ");
                System.out.print(attributes.getValue("F2") + " ");
                System.out.println(attributes.getValue("F3"));
            }
        };

        saxParser.parse("file.xml", handler);

    } catch (Exception e) {
        e.printStackTrace();
    }
3
  • Do you have to use the DOM API? This will load the entire file into memory at once, there's no way around it. But you could use something like SAX instead, which won't. Commented Mar 16, 2015 at 2:35
  • I don't have to use it. I just Googled around and found DOM. I'm willing to use anything that is the most efficient. Commented Mar 16, 2015 at 2:36
  • Look up SAX, it might fit your needs. Commented Mar 16, 2015 at 2:37

3 Answers 3

3

There are two ways to solve your problem. You can either increase the maximum memory on your application or use sax to parse your xml file.

Sign up to request clarification or add additional context in comments.

1 Comment

worked perfectly, guess if your input file is, ex: 1.8GB then using a DOM parser for it will require like 5GB+ whoa! Worked swell with a SAX one though.
1

Try the parameter -Xmx<size> when you run to increase the size of your heap.

E.g., java -Xmx500m <filename>

Comments

0

You will have to increase the memory limits of your Java VM: Set -Xmx=2048 or some other large enough value like that.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.