0

I'm using XSLT to transform XML files into a format that Excel can delimit (sample code shown later). For example, when opened in Excel, the delimited version might look something like:

+---------------+---------------+----------+
|URL            |Title          | Version  |
+---------------+---------------+----------+
|dogs_are_cool  |Dogs are cool  | May 2013 |
+---------------+---------------+----------+

The problem is related to the fact that every URL has the version appended at the end. Using the previous example, dogs_are_cool is actually dogs_are_cool_may2013.html.

I'd like to do two things with that appended version:

  • Remove the version when printing the URL.
  • Reformat and print the version.

I'm guessing the best way to do this is by somehow splitting the URL on the underscores. Then putting the last element split in one variable and printing the other elements in order--inserting the underscores back in.

I'm not sure how to go about that.

Sample XML:

<contents Url="toc_animals_may2013.html" Title="Animals">
    <contents Url="toc_apes_may2013.html" Title="Apes">
        <contents Url="chimps_may2013.html" Title="Some Stuff About Chimps" />
    </contents>
    <contents Url="toc_cats" Title="Cats">
        <contents Url="hairless_cats_may2013.html" Title="OMG Where Did the Hair Go?"/>
        <contents Url="wild_cats_may2013.html" Title="These Things Frighten Me"/>
    </contents>
    <contents Url="toc_dogs_may2013.html" Title="Dogs">
        <contents Url="toc_snorty_dogs_may2013.html" Title="Snorty Dogs">
            <contents Url="boston_terriers_may2013.html" Title="Boston Terriers" />
            <contents Url="french_bull_dogs_may2013.html" Title="Frenchies" />
        </contents>
    </contents>
</contents>

Sample XSLT:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="text" indent="no"/>

    <!-- This variable sets the delimiter symbol that Excel will use to seperate the cells -->
    <xsl:variable name="delimiter">@</xsl:variable>

    <xsl:template match="contents">

        <!-- Prints the URL -->
        <xsl:value-of select="@Url"/>
        <xsl:copy-of select="$delimiter" />

        <!-- Prints the title -->
        <xsl:apply-templates select="@Title"/>
        <xsl:copy-of select="$delimiter" />

        <!-- I'd like to print the version here -->
        <xsl:copy-of select="$delimiter" />

    <xsl:template match="/">
        <xsl:apply-templates select="//contents"/>
    </xsl:template>

</xsl:stylesheet>

2 Answers 2

2

If you can use XSLT 2.0, it becomes much simpler.

XML Input

<contents Url="toc_animals_may2013.html" Title="Animals">
    <contents Url="toc_apes_may2013.html" Title="Apes">
        <contents Url="chimps_may2013.html" Title="Some Stuff About Chimps" />
    </contents>
    <contents Url="toc_cats" Title="Cats">
        <contents Url="hairless_cats_may2013.html" Title="OMG Where Did the Hair Go?"/>
        <contents Url="wild_cats_may2013.html" Title="These Things Frighten Me"/>
    </contents>
    <contents Url="toc_dogs_may2013.html" Title="Dogs">
        <contents Url="toc_snorty_dogs_may2013.html" Title="Snorty Dogs">
            <contents Url="boston_terriers_may2013.html" Title="Boston Terriers" />
            <contents Url="french_bull_dogs_may2013.html" Title="Frenchies" />
        </contents>
    </contents>
</contents>

XSLT 2.0

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="text"/>
    <xsl:strip-space elements="*"/>

    <xsl:param name="delim" select="'@'"/>

    <xsl:template match="contents">
        <xsl:variable name="urlTokens" select="tokenize(@Url,'_')"/>
        <xsl:value-of select="$urlTokens[not(position() = last())]" separator="_"/>
        <xsl:value-of select="$delim"/>
        <xsl:value-of select="concat(@Title,$delim)"/>
        <xsl:analyze-string select="$urlTokens[last()]" regex="([a-z])([a-z]+)([0-9]+)">
            <xsl:matching-substring>
                <xsl:value-of select="concat(upper-case(regex-group(1)),regex-group(2),' ',regex-group(3))"/>               
            </xsl:matching-substring>
        </xsl:analyze-string>
        <xsl:text>&#xA;</xsl:text>
        <xsl:apply-templates/>
    </xsl:template>

</xsl:stylesheet>

Output

toc_animals@Animals@May 2013
toc_apes@Apes@May 2013
chimps@Some Stuff About Chimps@May 2013
toc@Cats@
hairless_cats@OMG Where Did the Hair Go?@May 2013
wild_cats@These Things Frighten Me@May 2013
toc_dogs@Dogs@May 2013
toc_snorty_dogs@Snorty Dogs@May 2013
boston_terriers@Boston Terriers@May 2013
french_bull_dogs@Frenchies@May 2013
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks Daniel. The 2.0 solution is certainly much simpler, but Excel doesn't seem to like 2.0.
1

Add a few more templates to help us out and we create an XSLT beast, but it seems to do the trick...

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="text" indent="no"/>
  <!-- This variable sets the delimiter symbol that Excel will use to seperate the cells -->
  <xsl:variable name="delimiter">@</xsl:variable>

  <xsl:template match="contents">
    <!-- Prints the URL -->
    <xsl:choose>
      <xsl:when test="contains(@Url, '.')">
        <xsl:call-template name="substring-before-last">
          <xsl:with-param name="list" select="@Url"/>
          <xsl:with-param name="delimiter" select="'_'"/>
        </xsl:call-template>            
      </xsl:when>
      <xsl:otherwise><xsl:value-of select="@Url"/></xsl:otherwise>
    </xsl:choose>
    <xsl:copy-of select="$delimiter"/>

    <!-- Prints the title -->
    <xsl:apply-templates select="@Title"/>
    <xsl:copy-of select="$delimiter"/>

    <!-- Now do all the tricks to format the version -->
    <xsl:variable name="withExtension">
      <xsl:call-template name="substring-after-last">
        <xsl:with-param name="string" select="@Url"/>
        <xsl:with-param name="delimiter" select="'_'"/>
      </xsl:call-template>
    </xsl:variable>

    <xsl:variable name="withoutExtension">
      <xsl:call-template name="substring-before-last">
        <xsl:with-param name="list" select="$withExtension"/>
        <xsl:with-param name="delimiter" select="'.'"/>
      </xsl:call-template>
    </xsl:variable>

    <xsl:variable name="withoutSpace">
      <xsl:value-of select="concat(translate(substring($withoutExtension, 1, 1), 'abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'), substring($withoutExtension, 2))"/>
    </xsl:variable>

    <xsl:variable name="year">
      <xsl:value-of select="translate($withoutSpace,translate($withoutSpace, '0123456789', ''), '')"/>
    </xsl:variable>

    <xsl:value-of select="concat(substring-before($withoutSpace, $year), ' ', $year)"/>
    <xsl:copy-of select="$delimiter"/>
  </xsl:template>

  <xsl:template match="/">
    <xsl:apply-templates select="//contents"/>
  </xsl:template>

  <xsl:template name="substring-before-last">
    <xsl:param name="list"/>
    <xsl:param name="delimiter"/>
    <xsl:choose>
      <xsl:when test="contains($list, $delimiter)">
        <xsl:value-of select="substring-before($list,$delimiter)"/>
        <xsl:choose>
          <xsl:when test="contains(substring-after($list,$delimiter),$delimiter)">
            <xsl:value-of select="$delimiter"/>
          </xsl:when>
        </xsl:choose>
        <xsl:call-template name="substring-before-last">
          <xsl:with-param name="list" select="substring-after($list,$delimiter)"/>
          <xsl:with-param name="delimiter" select="$delimiter"/>
        </xsl:call-template>
      </xsl:when>
    </xsl:choose>
  </xsl:template>

  <xsl:template name="substring-after-last">
    <xsl:param name="string"/>
    <xsl:param name="delimiter"/>
    <xsl:choose>
      <xsl:when test="contains($string, $delimiter)">
        <xsl:call-template name="substring-after-last">
          <xsl:with-param name="string" select="substring-after($string, $delimiter)"/>
          <xsl:with-param name="delimiter" select="$delimiter"/>
        </xsl:call-template>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="$string"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

</xsl:stylesheet>

Output:

toc_animals@Animals@May 2013@toc_apes@Apes@May 2013@chimps@Some Stuff About Chimps@May 2013@toc_cats@Cats@ @hairless_cats@OMG Where Did the Hair Go?@May 2013@wild_cats@These Things Frighten Me@May 2013@toc_dogs@Dogs@May 2013@toc_snorty_dogs@Snorty Dogs@May 2013@boston_terriers@Boston Terriers@May 2013@french_bull_dogs@Frenchies@May 2013@

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.