1

I want to transform XML file using XSLT. I made:

TransformerFactory factory = TransformerFactory.newInstance();
    InputStream is = 
this.getClass().getResourceAsStream(getPathToXSLTFile());
    Source xslt = new StreamSource(is);
    Transformer transformer = factory.newTransformer(xslt);
    Source text = new StreamSource(new File(getInputFileName()));
    transformer.transform(text, new StreamResult(new File(getOutputFileName())));

Which input file have about 10000000 lines, I have error:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at com.sun.org.apache.xml.internal.utils.FastStringBuffer.append(FastStringBuffer.java:682)
at com.sun.org.apache.xml.internal.dtm.ref.sax2dtm.SAX2DTM.characters(SAX2DTM.java:2111)
at com.sun.org.apache.xalan.internal.xsltc.dom.SAXImpl.characters(SAXImpl.java:863)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.characters(AbstractSAXParser.java:546)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:455)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:841)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:770)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
at com.sun.org.apache.xalan.internal.xsltc.dom.XSLTCDTMManager.getDTM(XSLTCDTMManager.java:421)
at com.sun.org.apache.xalan.internal.xsltc.dom.XSLTCDTMManager.getDTM(XSLTCDTMManager.java:215)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.getDOM(TransformerImpl.java:556)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:739)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:351)
at ru.magnit.task.utils.AbstractXmlUtil.transformXML(AbstractXmlUtil.java:66)
at ru.magnit.task.EntryPoint.main(EntryPoint.java:72)

In this line:

 transformer.transform(text, new StreamResult(new File(getOutputFileName())));

What is the reason for this and can it be optimized somehow, without the size of the heap?

UPDATE: My XSLT file:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="xml" indent="yes"/>

<xsl:template match="entries">
    <entries>
        <xsl:apply-templates/>
    </entries>
</xsl:template>

<xsl:template match="entry">
    <entry>
        <xsl:attribute name="field">
            <xsl:apply-templates select="*"/>
        </xsl:attribute>
    </entry>
</xsl:template>

1 Answer 1

2

In general XSLT 1.0 and 2.0 work with a data model which pulls the complete XML input into a tree model to allow full XPath navigation, resulting in a memory usage that increases with the size of the input document.

So unless you increase the heap space if your current document size leads to memory shortage there is not much you can do, at least not in general, there might be XSLT processor specific and some XSLT specific optimizations depending on your concrete XSLT code, but you can't avoid that the processor first pulls in the complete document. We would need to see your XSLT to try to tell whether it can be optimized. Profiling a stylesheet can help to identify areas to be optimized, I am not sure whether Xalan supports that. And I am not sure whether that stack trace not simply means that Xalan already runs out of memory when building the DTM (its tree model) for your large input, in that case obviously optimizing the XSLT code does not help as it is not even executed.

A Java specific way you could attempt is to use https://docs.oracle.com/javase/8/docs/api/javax/xml/transform/sax/SAXTransformerFactory.html instead to create a SAX filter from your stylesheet and chain it with a default Transformer to serialize the result of the filter, I think I have once tried that and found it can consume less memory than the traditional approach with a Transformer.

XSLT 3.0 tries to address the memory problem with the new approach of streaming (https://www.w3.org/TR/xslt-30/#streaming-concepts), however so far there is only one implementation with Saxon 9 EE, a commercial product. And in general a stylesheet is not necessarily streamable, instead you have to rewrite it to make it streamable (if that is at all possible, for instance sorting input nodes is not possible with streaming).

For instance, your posted stylesheet converted to XSLT 3.0 to use streaming (no rewrite necessary, only needed to set up the default mode as streamable) is

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:math="http://www.w3.org/2005/xpath-functions/math"
    exclude-result-prefixes="xs math"
    version="3.0">

    <xsl:mode streamable="yes"/>

    <xsl:output method="xml" indent="yes"/>

    <xsl:template match="entries">
        <entries>
            <xsl:apply-templates/>
        </entries>
    </xsl:template>

    <xsl:template match="entry">
        <entry>
            <xsl:attribute name="field">
                <xsl:apply-templates select="*"/>
            </xsl:attribute>
        </entry>
    </xsl:template>

</xsl:stylesheet>

and Saxon 9.8 EE and the beta of Exselt assess that as streamable.

Sign up to request clarification or add additional context in comments.

2 Comments

I added my XSLT file, see please
Note also that to make this work with Saxon-EE you would need to make a slight change to your Java code to make sure that the TransformerFactory you use is an instance of Saxon's StreamableTransformerFactory class.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.