2

I'm trying to process to xml files (docbook documents). There are repeating structures in the document that I would extract from both documents, parameterize, and store in a separate document.

To get it simplified, here is an example:

file1.xml:

<?xml version="1.0" encoding="UTF-8"?>
<input>
    <structure>foo</structure>
    <structure>bar</structure>
    <structure>baz</structure>
</input>

file2.xml:

<?xml version="1.0" encoding="UTF-8"?>
<input>
    <structure>abc</structure>
    <structure>xyz</structure>
    <structure>123</structure>
</input>

And this is the preferred output, I would like to generate. output.xml:

<?xml version="1.0" encoding="UTF-8"?>
<output>
    <structure origin="doc1">foo</structure>
    <structure origin="doc1">bar</structure>
    <structure origin="doc1">baz</structure>
    <structure origin="doc2">abc</structure>
    <structure origin="doc2">xyz</structure>
    <structure origin="doc2">123</structure>
</output>

Now I don't know how to convert two or more documents (URI can be hard coded) and one additional parameter (doc1, doc2 - these can also be hard coded) in XSLT.

I would be very grateful for any hints.

1 Answer 1

2

Whether you transform file1.xml and only read file2.xml with fn:doc(), or set both parameters and read both is a matter of choice, but the concept applies either way. Once you have both docs loaded, you can XPath to the /input/structure and then apply-templates.

With XSLT 2.0, you can obtain the base-uri() and parse that for the filename to use in the @origin:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
    xmlns:fn="http://www.w3.org/2005/xpath-functions" 
    exclude-result-prefixes="fn">
    <xsl:output indent="yes" />
    
    <xsl:param name="file1" select="'file1.xml'" />
    <xsl:param name="file2" select="'file2.xml'" />
    
    <xsl:template match="/">
        <output>
            <xsl:apply-templates select="(fn:doc($file1) | fn:doc($file2))/input/structure"/>
        </output>
    </xsl:template>
    
    <xsl:template match="structure">
        <xsl:copy>
          <xsl:attribute name="origin" select="concat('doc', replace(base-uri(), '.*(\d+).xml', '$1'))"/>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
    
</xsl:stylesheet>

If you need XSLT 1.0, you could send the filename as a param:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
    xmlns:fn="http://www.w3.org/2005/xpath-functions" 
    exclude-result-prefixes="fn">
    <xsl:output indent="yes" />
    
    <xsl:param name="file1" select="'file1.xml'" />
    <xsl:param name="file2" select="'file2.xml'" />
    
    <xsl:template match="/">
        <output>
            <xsl:call-template name="load-file">
                <xsl:with-param name="file" select="$file1"/>
            </xsl:call-template>
            <xsl:call-template name="load-file">
                <xsl:with-param name="file" select="$file2"/>
            </xsl:call-template>
        </output>
    </xsl:template>
    
    <xsl:template name="load-file">
        <xsl:param name="file"/>
        <xsl:apply-templates select="doc($file)/input/structure">
            <xsl:with-param name="file" select="$file"/>
        </xsl:apply-templates>
    </xsl:template>
    
    <xsl:template match="structure">
        <xsl:param name="file"/>
        <xsl:copy>
            <xsl:attribute name="origin">
              <xsl:value-of select="concat('doc', substring-after(substring-before($file, '.xml'), 'file'))"/>
        </xsl:attribute>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
    
</xsl:stylesheet>
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks @mads for your answer and sorry for the delay cause of the holidays. I understood your answer as far as that now both files are processed by xsl:copy-of one after the other and a single output is created. But I still do not understand how I can use the origin attribute with doc1 / doc2 (this should not be the filename). Sorry if my question was not clearly expressed.
whoops, looks like I glossed over that important detail in the output! I'll update the answer.
Thanks again for your fast reply. The origin-string must be a separate string. It has nothing in common with the filename and is not derived from it. This is my main problem. The extracted structures should be enriched with a string defined by myself and this string (e.g. doc1/doc2) should be hard coded. Sorry again if I did not communicate clearly. And XSLT2.0 is completely sufficient.
Same concept applies. You could add another param for your @origin value and either hard-code or derive from the filename.
I have updated with examples of how to snag the numeric value from the filenames.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.