0

I have an xml file and I want to manipulate the tags using the Java DOM, but its size is 25 gega-octets, so its telling me I can't and shows me this error:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

    public Frwiki() {
        filePath = "D:\\compressed\\frwiki-latest-pages-articles.xml";
    }

    public void deletingTag() throws Exception {
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        Document doc = factory.newDocumentBuilder().parse(filePath);
        NodeList nodes = doc.getElementsByTagName("*");

        for (int j = 0; j < 3; j++) {
            for (int i = 0; i < nodes.getLength(); i++) {
                Node node = nodes.item(i);
                if (!node.getNodeName().equals("id") && !node.getNodeName().equals("title")
                        && !node.getNodeName().equals("text") && !node.getNodeName().equals("mediawiki")
                        && !node.getNodeName().equals("revision") && !node.getNodeName().equals("page"))
                    node.getParentNode().removeChild(node);
            }
        }

        Transformer transformer = TransformerFactory.newInstance().newTransformer();
        transformer.transform(new DOMSource(doc), new StreamResult(filePath));
    }
3
  • When are you getting Java Lang Out of Memory? When reading the file or in the for loop? What line is getting that error? Commented Jan 19, 2023 at 14:14
  • 2
    Unless you have a huge machine, you won't be able to create a DOM tree of a 25G XML file. Best guess that will require something close to 250G RAM. See if you can use one of the streaming XML APIs instead, such as SAX or StAX. Commented Jan 19, 2023 at 14:31
  • i had no error number when the exception occurred, i cannot read a file of 25 go, i am looking for a way to read it line by line. Commented Jan 22, 2023 at 8:19

2 Answers 2

1

You can split a large file into smaller files using XSLT 3.0 streaming, like this:

<xsl:transform version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    
    <xsl:template name="xsl:initial-template">
      <xsl:source-document streamable="yes" href="frwiki-latest-pages-articles.xml">
        <xsl:for-each-group ....>
           <xsl:result-document href="......">
              <part><xsl:copy-of select="current-group()"/></part>
           </xsl:result-document>
        </xsl:for-each-group>
      </xsl:source-document>
    </xsl:template>
    
</xsl:transform>

The "..." parts depend on how you want to split the document and name the result files.

Although XSLT 3.0 streaming is a W3C specification, the only implementation available at the moment is my company's Saxon-EE processor.

Sign up to request clarification or add additional context in comments.

Comments

-1

Split the large XML file into smaller chunks and process them separately.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.