0

Similar questions have been asked, and I've read them and tried to figure this out using tutorials and what not, but haven't been able to. I am sure it's a matter of writing the correct xpath, but I can't seem to figure it out. I'm trying to take a list of files (basically everything in a folder) and combine them into a different schema format. The trick is that part of the information from the individual files needs to be used as a lookup table in the resulting XML. My solution needs to be purely XSLT1.0. It probably goes without saying that everything below is fictional...except maybe the structure of the "manifest" xml file that looks like the following:

<files>
    <file>request1.xml</xml>
    <file>request2.xml</xml>
    <file>request3.xml</xml>
<files>

Request1.xml file might look like the following:

<?xml version="1.0" encoding="UTF-8"?>
<ProductList xmlns:pl="http://products.produsor.com/pml" xmlns:pi="http://standards.product.produsor.com/pml" createDateTime="2014-05-06T18:13:51.0Z" version="5.0">
    <pl:Request requestId="ADF87A9DF7" quantity="1">
        <pl:SystemIdentifier name="GUID">38DDF5C1-A049-44DB-9EEA-3F5CB831228D</pl:SystemIdentifier>
        <pl:SystemIdentifier name="UPC">4236483268</pl:SystemIdentifier>
        <pl:Product>
            <pl:Names>
                <pi.ProductNameLongDescription>Classic Design Round Dinning Table</pi.ProductNameLongDescription>
                <pi.ProductNameShort>Dinning Table</pi.ProductNameShort>
            </pl:Names>
            <pl:Description>
                <pi.ProductLongDescription>This is a really awesome table.</pi.ProductLongDescription>
                <pi.ProductShortDescription>It's made of wood</pi.ProductShortDescription>
            </pl:Description>
            <pl:Category>
                <pl:Name>Table</pl:Name>
                <pl:Description>This category is for tables</pl:Description>
                <pl:Priority>1</pl:Priority>
            </pl:Category>
            <pl:Category>
                <pl:Name>Dinning Furniture</pl:Name>
                <pl:Description>This category is for Dinning Furniture</pl:Description>
                <pl:Priority>2</pl:Priority>
            </pl:Category>
            <pl:Category>
                <pl:Name>Wood Furniture</pl:Name>
                <pl:Description>This category is for Wood Furniture</pl:Description>
                <pl:Priority>3</pl:Priority>
            </pl:Category>
        </pl:Product>
    </pl:Request>
    <pl:Request requestId="DA7FDAFDA9" quanitity="1">
        <pl:SystemIdentifier name="GUID">DA7FDAFD-B049-45DB-9FFA-3F5CB834328D</pl:SystemIdentifier>
        <pl:SystemIdentifier name="UPC">4236483269</pl:SystemIdentifier>
        <pl:Product>
            <pl:Names>
                <pi.ProductNameLongDescription>Classic Design Round Coffee Table</pi.ProductNameLongDescription>
                <pi.ProductNameShort>Coffee Table</pi.ProductNameShort>
            </pl:Names>
            <pl:Description>
                <pi.ProductLongDescription>This is a really awesome table.</pi.ProductLongDescription>
                <pi.ProductShortDescription>It is made of wood</pi.ProductShortDescription>
            </pl:Description>
            <pl:Category>
                <pl:Name>Table</pl:Name>
                <pl:Description>This category is for tables</pl:Description>
                <pl:Priority>1</pl:Priority>
            </pl:Category>
            <pl:Category>
                <pl:Name>Living Room Furniture</pl:Name>
                <pl:Description>This category is for Dinning Furniture</pl:Description>
                <pl:Priority>4</pl:Priority>
            </pl:Category>
            <pl:Category>
                <pl:Name>Wood Furniture</pl:Name>
                <pl:Description>This category is for Wood Furniture</pl:Description>
                <pl:Priority>3</pl:Priority>
            </pl:Category>
        </pl:Product>
    </pl:Request>
</ProductList>

And Request2.xml would be something like this:

<?xml version="1.0" encoding="UTF-8"?>
<ProductList xmlns:pl="http://products.produsor.com/pml" xmlns:pi="http://standards.product.produsor.com/pml" createDateTime="2014-05-06T18:13:51.0Z" version="5.0">
    <pl:Request requestId="DFADF08D0A" quantity="10">
        <pl:SystemIdentifier name="GUID">38DDF5C1-A049-44DB-9EEA-3F5CB831228D</pl:SystemIdentifier>
        <pl:SystemIdentifier name="UPC">4236483268</pl:SystemIdentifier>
        <pl:Product>
            <pl:Names>
                <pi.ProductNameLongDescription>Classic Design Round Dinning Table</pi.ProductNameLongDescription>
                <pi.ProductNameShort>Dinning Table</pi.ProductNameShort>
            </pl:Names>
            <pl:Description>
                <pi.ProductLongDescription>This is a really awesome table.</pi.ProductLongDescription>
                <pi.ProductShortDescription>It's made of wood</pi.ProductShortDescription>
            </pl:Description>
            <pl:Category>
                <pl:Name>Table</pl:Name>
                <pl:Description>This category is for tables</pl:Description>
                <pl:Priority>1</pl:Priority>
            </pl:Category>
            <pl:Category>
                <pl:Name>Dinning Furniture</pl:Name>
                <pl:Description>This category is for Dinning Furniture</pl:Description>
                <pl:Priority>2</pl:Priority>
            </pl:Category>
            <pl:Category>
                <pl:Name>Wood Furniture</pl:Name>
                <pl:Description>This category is for Wood Furniture</pl:Description>
                <pl:Priority>3</pl:Priority>
            </pl:Category>
        </pl:Product>
    </pl:Request>
    <pl:Request requestId="RER7689EQ9" quanitity="10">
        <pl:SystemIdentifier name="GUID">DA7FDAFD-B049-45DB-9FFA-3F5CB834328D</pl:SystemIdentifier>
        <pl:SystemIdentifier name="UPC">4236483269</pl:SystemIdentifier>
        <pl:Product>
            <pl:Names>
                <pi.ProductNameLongDescription>Classic Design Round Coffee Table</pi.ProductNameLongDescription>
                <pi.ProductNameShort>Coffee Table</pi.ProductNameShort>
            </pl:Names>
            <pl:Description>
                <pi.ProductLongDescription>This is a really awesome table.</pi.ProductLongDescription>
                <pi.ProductShortDescription>It is made of wood</pi.ProductShortDescription>
            </pl:Description>
            <pl:Category>
                <pl:Name>Table</pl:Name>
                <pl:Description>This category is for tables</pl:Description>
                <pl:Priority>1</pl:Priority>
            </pl:Category>
            <pl:Category>
                <pl:Name>Living Room Furniture</pl:Name>
                <pl:Description>This category is for Dinning Furniture</pl:Description>
                <pl:Priority>4</pl:Priority>
            </pl:Category>
            <pl:Category>
                <pl:Name>Wood Furniture</pl:Name>
                <pl:Description>This category is for Wood Furniture</pl:Description>
                <pl:Priority>3</pl:Priority>
            </pl:Category>
        </pl:Product>
    </pl:Request>
</ProductList>

And what I want is the following:

<ProductList xmlns:pl="http://products.produsor.com/pml">
    <pl:Submission>
<!--********* This is the problem area *************-->
        <pl:Descriptions>
            <pl:Description id="1">This is a really awesome table.</pl:Description>
        </pl:Descriptions>
        <pl:Categories>
            <pl:Category id="1">Table</pl:Category>
            <pl:Category id="2">Dinning Furniture</pl:Category>
            <pl:Category id="3">Living Room Furniture</pl:Category>
            <pl:Category id="4">Wood Furniture</pl:Category>
        </pl:Categories>
<!--****************************************************-->
        <pl:Product>
            <pl:SystemIdentifier type="GUID">DA7FDAFD-B049-45DB-9FFA-3F5CB834328D</pl:SystemIdentifier>
            <pl:SystemIdentifier name="UPC">4236483268</pl:SystemIdentifier>
            <pl:ProductName descriptionId="1">Dinning Table</pl:ProductName>
            <cat catId="1"/>
            <cat catId="2"/>
            <cat catId="3"/>
        </pl:Product>
        <pl:Product>
            <pl:SystemIdentifier type="GUID">DA7FDAFD-B049-45DB-9FFA-3F5CB834328D</pl:SystemIdentifier>
            <pl:SystemIdentifier name="UPC">4236483268</pl:SystemIdentifier>
            <pl:ProductName descriptionId="1">Dinning Table</pl:ProductName>
            <cat catId="1"/>
            <cat catId="3"/>
            <cat catId="4"/>
        </pl:Product>       
        <pl:Product>
            <pl:SystemIdentifier type="GUID">DA7FDAFD-B049-45DB-9FFA-3F5CB834328D</pl:SystemIdentifier>
            <pl:SystemIdentifier name="UPC">4236483268</pl:SystemIdentifier>
            <pl:ProductName descriptionId="1">Dinning Table</pl:ProductName>
            <cat catId="1"/>
            <cat catId="2"/>
            <cat catId="3"/>
        </pl:Product>
        <pl:Product>
            <pl:SystemIdentifier type="GUID">DA7FDAFD-B049-45DB-9FFA-3F5CB834328D</pl:SystemIdentifier>
            <pl:SystemIdentifier name="UPC">4236483268</pl:SystemIdentifier>
            <pl:ProductName descriptionId="1">Dinning Table</pl:ProductName>
            <cat catId="1"/>
            <cat catId="3"/>
            <cat catId="4"/>
        </pl:Product>   
    </pl:Submission>
</ProductList>

The trick is that I can't have repeating values in the pl:Description or the pl:category tags. It is required that the product elements repeat if they are repeated in the files. I have the xslt templates built to construct everything, including the descriptions and categories, but it does it for each file. I need it build the descriptions and categories once including the distinct data from all of the files and then all of the product elements. Here is what I have so far, which builds the product elements.

<xsl:template match="/">
    <xsl:for-each select="/files/file">
        <xsl:apply-templates select="document(.)/ProductList/pl:Request"/>
    </xsl:for-each>
</xsl:template>

Since this is pretty long already, I'll just say that the request template works to create the product elements and I have a "ProductList" template which will create the descriptions and categories element structure.

5
  • If you need to merge several files then show us two minimal samples and the merged result and explain on which criteria you want to merge. Commented May 20, 2016 at 18:56
  • I've edited to include another example of the file (they are nearly identical), this is representative of how the data is in the actual xml files. I am merging all of the files and data. there are no criteria for filtering "products". Commented May 20, 2016 at 19:06
  • Can you use exsl:node-set or similar? Or is that beyond "pure" XSLT 1.0? Commented May 20, 2016 at 19:27
  • Also the <pl:Category id="1">Table</pl:Category> elements in the result, are they just a list of all pl:Category in the input documents or do you need to eliminate duplicates based on the value (e.g. Table)? Commented May 20, 2016 at 19:39
  • The pl:Category elements in the results need to be distinct. Unless I'm mistaken, it looks like you already accounted for that in the answer you posted anyway. Thanks. Commented May 20, 2016 at 21:32

1 Answer 1

1

Here is an example that copies all categories into a result tree fragment, uses exsl:node-set and then Muenchian grouping to identify unique categories and then references them when copying the request elements:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:exsl="http://exslt.org/common"
    xmlns:pl="http://products.produsor.com/pml"
    version="1.0"
    exclude-result-prefixes="exsl">

    <xsl:output indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:variable name="input-docs" select="document(files/file)"/>

    <xsl:variable name="cats-rtf">
        <xsl:copy-of select="$input-docs//pl:Category"/>
    </xsl:variable>

    <xsl:key name="group" match="pl:Category" use="pl:Name"/>

    <xsl:variable name="distinct-cats-rtf">
        <xsl:for-each select="exsl:node-set($cats-rtf)/pl:Category[generate-id() = generate-id(key('group', pl:Name)[1])]">
            <pl:Category id="{position()}">
                <xsl:value-of select="pl:Name"/>
            </pl:Category>
        </xsl:for-each>
    </xsl:variable>

    <xsl:variable name="distinct-cats" select="exsl:node-set($distinct-cats-rtf)/pl:Category"/>

    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="/">
        <ProductList>
            <pl:Submission>
                <pl:Categories>
                    <xsl:copy-of select="$distinct-cats"/>
                </pl:Categories>
                <xsl:apply-templates select="$input-docs//pl:Request"/>
            </pl:Submission>
        </ProductList>
    </xsl:template>

    <xsl:template match="pl:Category">
        <cat catId="{$distinct-cats[. = current()/pl:Name]/@id}"/>
    </xsl:template>

</xsl:stylesheet>

You could use the same approach to identify the unique descriptions and reference them.

Sign up to request clarification or add additional context in comments.

6 Comments

Thanks. That looks fairly simple. I'll give that a shot as soon as I can (not lucky enough to be able to work from home on this project).
Unfortunately I can't use exsl. I attempted and I get "Namespace 'exslt.org/common' does not contain any functions." I am guessing it's something related to our network security.
Well, find out which XSLT processor you use, for instance by running home.arcor.de/martin.honnen/xslt/processorTest2.xml through your processor, then check its documentation for an extension function to convert a result tree fragment into a node-set.
@Mark_Eng, have you figured out which XSLT processor you use? Not all support the exsl:node-set extension function, most notably Microsoft's MSXML and XslTransform, but usually there is then support for a propietary extension function in a custom namespace that does the same job.
I wasn't able to use your site. It is blocked by our network security (it's very strict). I have been using some of your suggestion above to work out a method for doing what I need. I haven't quite gotten it yet, but I think I'll be able to do it with what you gave me above. I'll go ahead and mark this answered. Thanks.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.