0

I have various xml files which have been merged together. This means that there are duplicate root elements, and duplicate xml declarations.

I am wanting to run a transform over the top to remove the duplicates and wrap the content with a new root element.

Is this possible using xslt?

2
  • However you merged, it was not with a conformant XML DOM library as duplicate xml declarations should not have been rendered. Likely the invalid XML was built with text file manipulations. Please back up and describe the earlier process. Do note: XSLT can merge files with document() function. Commented Feb 21, 2018 at 21:33
  • If it has multiple XML declarations then it isn't an XML file, so there's a contradiction inherent in your question. You don't want to be sorting out this mess after the event, you want to avoid creating the mess in the first place. There are ways to merge XML files to create well-formed XML output. Commented Feb 22, 2018 at 10:47

1 Answer 1

2

It depends on how you use XSLT, how you provide the input source. In general the format you have is not an XML document and with various XML declarations it is not even an external entity or fragment. So even with XPath 3 and parse-xml-fragment you would first need to remove the XML declarations.

You could however try to load the document using unparsed-text and then use replace with a regular expression to remove the XML declarations and then finally parse-xml-fragment to parse the fragments into nodes you can then transform further e.g. remove the various root elements and wrap their child nodes into a common one:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:math="http://www.w3.org/2005/xpath-functions/math"
    exclude-result-prefixes="xs math"
    version="3.0">

    <xsl:param name="fragment-uri" as="xs:string" select="'fragment-input1.txt'"/>

    <xsl:param name="fragments" as="xs:string" select="unparsed-text($fragment-uri)"/>

    <xsl:param name="declaration-regex" as="xs:string"><![CDATA[<\?xml\s+[^>]*?\?>]]></xsl:param>

    <xsl:variable name="fragments-with-declarations-stripped" as="xs:string"
        select="replace($fragments, $declaration-regex, '')"/>

    <xsl:template match="/" name="xsl:initial-template">
        <root>
            <xsl:copy-of select="parse-xml-fragment($fragments-with-declarations-stripped)/*/node()"/>
        </root> 
    </xsl:template>

</xsl:stylesheet>

An input "fragment-input1.txt" of the form

    <?xml version='1.0'?>
    <root1>
      <foo1>...</foo1>
    </root1>
    <?xml version="1.0"?><root2><foo2>...</foo2></root2>
    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="foo.xsl"?>
    <root3>
      <foo3>...</foo3>
    </root3>
    <?xml version="1.0" encoding='ISO-8859-1' standalone="yes"?>
    <root4>
      <foo4>...</foo4>
    </root4>

is that way transformed into the result:

<?xml version="1.0" encoding="UTF-8"?><root>
                  <foo1>...</foo1>
                <foo2>...</foo2>
                  <foo3>...</foo3>

                  <foo4>...</foo4>
                </root>

Note: I am not sure whether the used regular expression is really sufficient to strip any allowed form of an XML declaration.

And the whole error-prone process can be avoided by simply using XSLT and document and/or doc and/or collection and/or xsl:merge to merge the different files properly instead of using XSLT trying to fix the wrong merge result.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for your exceptional answer! I have never used xsl v3 before but I'll give it a go and report back. One question I had was regarding the fragment-uri param - is there a way to select the input document context as opposed to hard coding it in the transform? Cheers!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.