3

I have a XML file in which everything is well structured except for ordered lists. Every list item is tagged as a paragraph <p>, with the enumeration added manually: (1). I want to create a valid HTML list from that source.

Using the xsl:matching-substring method and regular expressions I was able to extract every list item but I can't seem to find a way to add the surrounding <ol> tags.

Here is an example:

XML source:

<Content>
    <P>(1) blah</P>
    <P>(2) blah</P>
    <P>(2) blah</P>
</Content>

What I have so far:

<xsl:variable name="text" select="/Content/*/text()"/>
<xsl:analyze-string select="$text" regex="(\(\d+\))([^(]*)">
    <xsl:matching-substring>    
        <![CDATA[<li>]]><xsl:value-of select="regex-group(2)"/><![CDATA[</li>]]>
    </xsl:matching-substring>
</xsl:analyze-string>

Output:

<li>blah</li>
<li>blah</li>
<li>blah</li>

In case you are wondering: output has to be plain text in general, only the contents of the $text variable have to be output in HTML. That's why I am using <![CDATA[]].

2
  • The provided code shouldn't produce any result but the following error: "Saxon 9.1.0.5J from Saxonica Java version 1.6.0_31 Stylesheet compilation time: 586 milliseconds Processing file:/C:/Program%20Files/Java/jre6/bin/marrowtr.xml Building tree for file:/C:/Program%20Files/Java/jre6/bin/marrowtr.xml using net.sf.saxon.tinytree.TinyBuilder Tree built in 0 milliseconds Error on line 6 of marrowtr.xsl: XPTY0004: A sequence of more than one item is not allowed as the @select attribute of xsl:analyze-string ("(1) blah", "(2) blah", ...) Transformation failed: Run-time errors were reported " Commented Nov 18, 2012 at 22:24
  • @KelvinMackay, Understood. Do notice that the OP is creating tags as strings -- this is completely wrong and chances are these strings will not be interpreted as HTML elements but just as strings. XSLT doesn't deal with "tags" but with nodes. As result a proper transformation creates elements, not string that happen to be the serializations of these elements. Commented Nov 18, 2012 at 22:53

2 Answers 2

3

As simple as this:

I. XSLT 2.0 solution:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/*">
  <ol>
    <xsl:apply-templates/>
  </ol>
 </xsl:template>

 <xsl:template match="P[matches(., '(^\(\d+\)\s*)(.*)')]">
    <li>
        <xsl:analyze-string select="." regex="(^\(\d+\)\s*)(.*)">
            <xsl:matching-substring>
              <xsl:value-of select="regex-group(2)"/>
            </xsl:matching-substring>
        </xsl:analyze-string>
    </li>
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the provided XML document:

<Content>
    <P>(1) blah</P>
    <P>(2) blah</P>
    <P>(2) blah</P>
</Content>

the wanted, correct result is produced:

<ol>
    <li>blah</li>
    <li>blah</li>
    <li>blah</li>
</ol>

II. XSLT 1.0 solution:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="/*">
  <ol>
    <xsl:apply-templates/>
  </ol>
 </xsl:template>

 <xsl:template match=
  "P[starts-with(.,'(')
   and
     floor(substring-before(substring(.,2), ')'))
    =
     substring-before(substring(.,2), ')')
    ]">
    <li>
         <xsl:value-of select="substring-after(., ') ')"/>
    </li>
 </xsl:template>
</xsl:stylesheet>

when this transformation is applied on the same XML document (above), the same correct result is produced:

<ol>
   <li>blah</li>
   <li>blah</li>
   <li>blah</li>
</ol>
Sign up to request clarification or add additional context in comments.

Comments

0

This is not really a solution, but a suggested slight improvement on Dimitre's solution.

(1) The template match condition for the XSLT 2.0 solution can be simplified to ...

<xsl:template match="P[matches(., '^\(\d+\)')]">

Having said that, the regex for the xsl:analyze-string should remain as it is.

(2) Possibly, this is outside the scope of the question, but the question reads like html is the intended output, and so the html xsl:output method should be suggested to the OP.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.