2

I have a set of XML documents that contain some large lists of values in a single XML element. I need to determine how large each list is and only output the count when they are too large. I am required to use xsltproc that only supports 1.0 and have tried using the count() function but that does not seem to produce any value other than 1. An example style-sheet is:

<?xml version="1.0" encoding="US-ASCII"?>
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                >

  <!-- NOTE: US-ASCII encoding is not compatible with Java HTML text -->
  <xsl:output method="html" indent="yes" encoding="ASCII"/>

  <xsl:template match="/">
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
      <head>
        <title><xsl:value-of select="'Test Case for count()'"/></title>
      </head>
      <body>
        <xsl:element name="table">
          <xsl:attribute name="border">1</xsl:attribute>
          <xsl:attribute name="align">center</xsl:attribute>

          <xsl:call-template name="DblColTableDataRow">
            <xsl:with-param name="DataLabel" select="'count'"/>
            <xsl:with-param name="DataValue" select="function-available('count')"/>
          </xsl:call-template>

          <xsl:call-template name="DblColTableDataRow">
            <xsl:with-param name="DataLabel" select="'normalize-space'"/>
            <xsl:with-param name="DataValue" select="function-available('normalize-space')"/>
          </xsl:call-template>

          <xsl:call-template name="DblColTableDataRow">
            <xsl:with-param name="DataLabel" select="'string-length'"/>
            <xsl:with-param name="DataValue" select="function-available('string-length')"/>
          </xsl:call-template>

          <xsl:call-template name="DblColTableDataRow">
            <xsl:with-param name="DataLabel" select="'replace'"/>
            <xsl:with-param name="DataValue" select="function-available('replace')"/>
          </xsl:call-template>

          <xsl:call-template name="DblColTableDataRow">
            <xsl:with-param name="DataLabel" select="'tokenize'"/>
            <xsl:with-param name="DataValue" select="function-available('tokenize')"/>
          </xsl:call-template>

          <xsl:call-template name="DblColTableDataRow">
            <xsl:with-param name="DataLabel" select="'contains'"/>
            <xsl:with-param name="DataValue" select="function-available('contains')"/>
          </xsl:call-template>

          <xsl:variable name="DataIn" select="' A B C '"/>
          <xsl:variable name="DataList">
            <xsl:call-template name="Tokenize-Str">
              <xsl:with-param name="Data" select="$DataIn"/>
            </xsl:call-template>
          </xsl:variable>

          <xsl:call-template name="DblColTableDataRow">
            <xsl:with-param name="DataLabel"
                            select="concat('tokenize(',$DataIn,')')"/>
            <xsl:with-param name="DataValue">
              <xsl:call-template name="Tokenize-Str">
                <xsl:with-param name="Data" select="$DataIn"/>
              </xsl:call-template>
            </xsl:with-param>
          </xsl:call-template>

          <xsl:call-template name="DblColTableDataRow">
            <xsl:with-param name="DataLabel"
                            select="concat('count(',$DataIn,')')"/>
            <xsl:with-param name="DataValue">
              <xsl:copy-of select="$DataList"/>
              <xsl:text>: </xsl:text>
              <xsl:value-of select="count(($DataList))"/>
            </xsl:with-param>
          </xsl:call-template>
        </xsl:element>
      </body>
    </html>
  </xsl:template>

  <xsl:template name="DblColTableDataRow">
    <xsl:param name="DataLabel" select="'?:'"/>
    <xsl:param name="DataValue" select="'???'"/>
    <xsl:element name="tr">
      <xsl:element name="td">
        <xsl:attribute name="style">text-align:right</xsl:attribute>
        <xsl:copy-of select="$DataLabel"/>
      </xsl:element>
      <xsl:element name="td">
        <xsl:copy-of select="$DataValue"/>
      </xsl:element>
    </xsl:element>
  </xsl:template>

  <!-- template needed because tokenize function not supported -->
  <xsl:template name="Tokenize-Str">
    <xsl:param name="Data"/>
    <xsl:variable name="DataStr">
      <xsl:value-of select="normalize-space($Data)"/>
    </xsl:variable>
    <xsl:if test="0 != string-length($DataStr)">
      <!--xsl:value-of select="concat('Tkn-Str(',$Data,')')"/-->
      <xsl:choose>
        <xsl:when test="contains($DataStr,' ')">
          <xsl:element name="tkn">
            <xsl:value-of select="substring-before($DataStr, ' ')"/>
          </xsl:element>
          <xsl:call-template name="Tokenize-Str">
            <xsl:with-param name="Data"
                            select="substring-after($DataStr, ' ')"/>
          </xsl:call-template>
        </xsl:when>
        <xsl:otherwise>
          <xsl:element name="tkn">
            <xsl:value-of select="$DataStr"/>
          </xsl:element>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:if>
  </xsl:template>

</xsl:stylesheet>

This is set up so that the XML document content does not matter. The command:

xsltproc -o tst.html test_case.xsl whatever.xml

Produces:

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="en" lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=ASCII">
<title>Test Case for count()</title></head><body><table border="1" align="center"><tr xmlns="">
<td style="text-align:right">count</td>
<td>true</td>
</tr>
<tr xmlns="">
<td style="text-align:right">normalize-space</td>
<td>true</td>
</tr>
<tr xmlns="">
<td style="text-align:right">string-length</td>
<td>true</td>
</tr>
<tr xmlns="">
<td style="text-align:right">replace</td>
<td>false</td>
</tr>
<tr xmlns="">
<td style="text-align:right">tokenize</td>
<td>false</td>
</tr>
<tr xmlns="">
<td style="text-align:right">contains</td>
<td>true</td>
</tr>
<tr xmlns="">
<td style="text-align:right">tokenize( A B C )</td>
<td>
<tkn>A</tkn><tkn>B</tkn><tkn>C</tkn>
</td>
</tr>
<tr xmlns="">
<td style="text-align:right">count( A B C )</td>
<td>
<tkn>A</tkn><tkn>B</tkn><tkn>C</tkn>: 1</td>
</tr></table></body></html>

I am not sure why I am getting a count of 1 since my template is clearly returning 3 element nodes.

5
  • Wouldn't it be simpler to count the delimiters? Commented Sep 28, 2018 at 15:55
  • Yes, you are correct. I thought of that shortly after posting the question. Unfortunately, my recursion-call in that case is having some sort of parameter passing problem. Commented Oct 1, 2018 at 10:43
  • I am afraid I don't follow. You don't need recursion in order to count the delimiters; it's a simple calculation performed on the input string. Commented Oct 1, 2018 at 15:14
  • I'm afraid that I do not know what function I would use to "count delimiters". Commented Oct 2, 2018 at 12:35
  • I have posted an answer showing how. Commented Oct 2, 2018 at 13:43

2 Answers 2

2

With pure XSLT 1, any variable containing result nodes created with xsl:element or literal result elements is a result tree fragment https://www.w3.org/TR/xslt-10/#section-Result-Tree-Fragments which is a data structure very different from a node-set you get from your input document(s).

So your variable $DataList is such a result tree fragment you can output with xsl:copy-of but you can't use XPath on its content, for that you need an extension function like exsl:node-set (http://exslt.org/exsl/index.html) e.g. <xsl:value-of select="count(exsl:node-set($DataList)/*)" xmlns:exsl="http://exslt.org/common"/> would give you the count you are looking for (as the exsl:node-set function converts your result tree fragment into a root node containing your result element nodes).

Note that xsltproc should support http://exslt.org/str/functions/tokenize/index.html, so you should simply be able to use e.g. <xsl:value-of select="count(str:tokenize('A B C', ' '))"/> with an appropriate namespace declaration of xmlns:str="http://exslt.org/strings" in your stylesheet.

Sign up to request clarification or add additional context in comments.

3 Comments

Actually, using exslt is what I thought of in the first place. However, the machine in question is not on-line and I have no way to add such extensions.
There is no need to "add" support for "exsl:node-set" to "xsltproc". It might be that you have to configure it use "str:tokenize", I can't really tell that although the installation I have on Windows supports it without any configuration. But I don't think anyone creates/installs/configures an XSLT 1 processor without support for exsl:node-set (or a corresponding function in a proprietary namespace, as some Microsoft XSLT processor like MSXML or XslTransform do). So use exsl:node-set, it has to be supported. Otherwise you would need to run two separate transformation.
Okay, that works on my test machine out here. I just need to check it on the isolated machine.
0

If you only need to count how many tokens are in a given string, without extracting the individual tokens, you can do so simply by calculating the number of delimiters contained in the string.

Here's a simple example:

XML

<input>alpha bravo charlie</input>

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<xsl:template match="/">
    <count-tokens>
        <xsl:value-of select="string-length(input) - string-length(translate(input, ' ', '')) + 1"/>
    </count-tokens>
</xsl:template>

</xsl:stylesheet>

Result

<?xml version="1.0" encoding="UTF-8"?>
<count-tokens>3</count-tokens>

1 Comment

This definately looks like the better way to do the job.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.