1

I have XML data that was extracted from a legacy Lotus Notes application and that has embedded richtext formatting. I am having difficulty rendering the richtext lists as well-formed HTML.

The problem is that each list does not have closing tag to indicate when the list ends. Each list does however have an opening tag with a unique ID that indicates the start of the list, and each list item has an attribute that matches the list ID. The richtext has lots of noise (garbage paragraphs), often interspersed between legitimate list items, that need to be disregarded.

I have XSLT inspired by this solution from @Tim-C but it's not working.

This is the XML:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="NoBullet6.xslt"?>
<document>
    <item name="Unordered list">
        <richtext>
            <pardef/>
            <par def="20">
                <run>This is the first </run>
                <run>paragraph of the preamble.</run>
            </par>
            <par>
                <run>This is the second paragraph of the </run>
                <run>preamble.</run>
            </par>
            <pardef id="21" list="unordered"/>
            <par def="21">
                <run>This is the </run>
                <run>first bullet.</run>
            </par>
            <par def="20">
                <run/>
                <!-- This is an empty paragraph/garbage data -->
            </par>
            <par>
                <run>This is the second </run>
                <run>bullet.</run>
            </par>
            <par def="20">
                <run>This is the first </run>
                <run>paragraph of the conclusion.</run>
            </par>
            <par>
                <run>This is the second paragraph of the </run>
                <run>conclusion.</run>
            </par>
        </richtext>
    </item>
    <item name="Ordered list">
        <richtext>
            <pardef/>
            <par def="20">
                <run>This is the first </run>
                <run>paragraph of the preamble.</run>
            </par>
            <par>
                <run>This is the second paragraph of the </run>
                <run>preamble.</run>
            </par>
            <pardef id="46" list="ordered"/>
            <par def="46">
                <run>This is the </run>
                <run>first numbered item.</run>
            </par>
            <par def="47">
                <run/>
                <!-- This is an empty paragraph/garbage data -->
            </par>
            <par def="46">
                <run>This is the another </run>
                <run>numbered item.</run>
            </par>
            <par def="20">
                <run>This is the first </run>
                <run>paragraph of the conclusion.</run>
            </par>
            <par>
                <run>This is the second paragraph of the </run>
                <run>conclusion.</run>
            </par>
        </richtext>
    </item>
</document>

This is the desired output:

<html>
  <body>
     <table border="1">
        <tr>
           <td>Unordered list</td>
           <td>
              <p>This is the first paragraph of the preamble.</p>
              <p>This is the second paragraph of the preamble.</p>
              <ul>
                 <li>This is the first bullet.</li>
                 <li>This is the second bullet.</li>
              </ul>
              <p>This is the first paragraph of the conclusion.</p>
              <p>This is the second paragraph of the conclusion.</p>
           </td>
        </tr>
        <tr>
           <td>Ordered list</td>
           <td>
              <p>This is the first paragraph of the preamble.</p>
              <p>This is the second paragraph of the preamble.</p>
              <ol>
                 <li>This is the first numbered item.</li>
                 <li>This is the another numbered item.</li>
              </ol>
              <p>This is the first paragraph of the conclusion.</p>
              <p>This is the second paragraph of the conclusion.</p>
           </td>
        </tr>
     </table>
  </body>

This is the XSLT:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
    <xsl:output indent="yes"/>


    <xsl:key name="pars" match="par[not(@def)]" use="generate-id(preceding-sibling::par[@def][1])" />


    <xsl:template match="/*">
        <html>
            <body>
                <table border="1">
                    <xsl:apply-templates />
                </table>
            </body>
        </html>
    </xsl:template>

    <xsl:template match="item">
        <tr>
            <td><xsl:value-of select="@name"/></td>
            <td>
                <xsl:apply-templates select="richtext/par[@def]" />
            </td>
        </tr>
    </xsl:template>

    <xsl:template match="par[@def]">
        <xsl:variable name="listType" select="preceding-sibling::*[1][self::pardef]/@list" />
        <xsl:variable name="group" select="self::* | key('pars', generate-id())" />
        <xsl:choose>
            <xsl:when test="$listType = 'unordered'">    
                <ul>
                    <xsl:apply-templates select="$group" mode="list"/>
                </ul>
            </xsl:when>
            <xsl:when test="$listType = 'ordered'">    
                <ol>
                    <xsl:apply-templates select="$group"  mode="list"/>
                </ol>
            </xsl:when>
            <xsl:otherwise>
                <xsl:apply-templates select="$group" mode="para" />   
            </xsl:otherwise>     
        </xsl:choose>   
    </xsl:template>

    <xsl:template match="par" mode="list">
        <li>
            <xsl:value-of select="run" separator=""/>
        </li>  
    </xsl:template>

    <xsl:template match="par" mode="para">
        <p>
            <xsl:value-of select="run" separator=""/>
        </p>  
    </xsl:template>
</xsl:stylesheet>
1
  • How can we know which paragraphs need to be inserted as list items?? Commented Oct 18, 2016 at 8:01

1 Answer 1

1

As you are using XSLT 2.0, you can actually use xsl:for-each-group here, which could potentially simplify things.

You could group the par elements by their def attribute (ignoring "empty" elements), or in the case where there is no def attribute, but the def attribute of the first preceding (non-empty) sibling with one.

 <xsl:for-each-group select="par[run[normalize-space()]]" 
                     group-adjacent="if (@def) then @def else preceding-sibling::par[run[normalize-space()]][@def][1]/@def">

Instead of groups variable, you can use the function current-group() to get the current group.

Try this XSLT

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
    <xsl:output indent="yes"/>

    <xsl:template match="/*">
        <html>
            <body>
                <table border="1">
                    <xsl:apply-templates />
                </table>
            </body>
        </html>
    </xsl:template>

    <xsl:template match="item">
        <tr>
            <td><xsl:value-of select="@name"/></td>
            <td>
                <xsl:apply-templates select="richtext" />
            </td>
        </tr>
    </xsl:template>

    <xsl:template match="richtext">
        <xsl:for-each-group select="par[run[normalize-space()]]" group-adjacent="if (@def) then @def else preceding-sibling::par[run[normalize-space()]][@def][1]/@def">
            <xsl:variable name="listType" select="preceding-sibling::*[1][self::pardef]/@list" />
            <xsl:choose>
                <xsl:when test="$listType = 'unordered'">    
                    <ul>
                        <xsl:apply-templates select="current-group()" mode="list"/>
                    </ul>
                </xsl:when>
                <xsl:when test="$listType = 'ordered'">    
                    <ol>
                        <xsl:apply-templates select="current-group()"  mode="list"/>
                    </ol>
                </xsl:when>
                <xsl:otherwise>
                    <xsl:apply-templates select="current-group()" mode="para" />   
                </xsl:otherwise>     
            </xsl:choose>   
        </xsl:for-each-group>
    </xsl:template>

    <xsl:template match="par" mode="list">
        <li>
            <xsl:value-of select="run" separator=""/>
        </li>  
    </xsl:template>

    <xsl:template match="par" mode="para">
        <p>
            <xsl:value-of select="run" separator=""/>
        </p>  
    </xsl:template>
</xsl:stylesheet>
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks again! It worked this time. Apologies for being unclear the first time. :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.