I have XML data that was extracted from a legacy Lotus Notes application and that has embedded richtext formatting. I am having difficulty rendering the richtext lists as well-formed HTML.
The problem is that each list does not have closing tag to indicate when the list ends. Each list does however have an opening tag with a unique ID that indicates the start of the list, and each list item has an attribute that matches the list ID. The richtext has lots of noise (garbage paragraphs), often interspersed between legitimate list items, that need to be disregarded.
I have XSLT inspired by this solution from @Tim-C but it's not working.
This is the XML:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="NoBullet6.xslt"?>
<document>
<item name="Unordered list">
<richtext>
<pardef/>
<par def="20">
<run>This is the first </run>
<run>paragraph of the preamble.</run>
</par>
<par>
<run>This is the second paragraph of the </run>
<run>preamble.</run>
</par>
<pardef id="21" list="unordered"/>
<par def="21">
<run>This is the </run>
<run>first bullet.</run>
</par>
<par def="20">
<run/>
<!-- This is an empty paragraph/garbage data -->
</par>
<par>
<run>This is the second </run>
<run>bullet.</run>
</par>
<par def="20">
<run>This is the first </run>
<run>paragraph of the conclusion.</run>
</par>
<par>
<run>This is the second paragraph of the </run>
<run>conclusion.</run>
</par>
</richtext>
</item>
<item name="Ordered list">
<richtext>
<pardef/>
<par def="20">
<run>This is the first </run>
<run>paragraph of the preamble.</run>
</par>
<par>
<run>This is the second paragraph of the </run>
<run>preamble.</run>
</par>
<pardef id="46" list="ordered"/>
<par def="46">
<run>This is the </run>
<run>first numbered item.</run>
</par>
<par def="47">
<run/>
<!-- This is an empty paragraph/garbage data -->
</par>
<par def="46">
<run>This is the another </run>
<run>numbered item.</run>
</par>
<par def="20">
<run>This is the first </run>
<run>paragraph of the conclusion.</run>
</par>
<par>
<run>This is the second paragraph of the </run>
<run>conclusion.</run>
</par>
</richtext>
</item>
</document>
This is the desired output:
<html>
<body>
<table border="1">
<tr>
<td>Unordered list</td>
<td>
<p>This is the first paragraph of the preamble.</p>
<p>This is the second paragraph of the preamble.</p>
<ul>
<li>This is the first bullet.</li>
<li>This is the second bullet.</li>
</ul>
<p>This is the first paragraph of the conclusion.</p>
<p>This is the second paragraph of the conclusion.</p>
</td>
</tr>
<tr>
<td>Ordered list</td>
<td>
<p>This is the first paragraph of the preamble.</p>
<p>This is the second paragraph of the preamble.</p>
<ol>
<li>This is the first numbered item.</li>
<li>This is the another numbered item.</li>
</ol>
<p>This is the first paragraph of the conclusion.</p>
<p>This is the second paragraph of the conclusion.</p>
</td>
</tr>
</table>
</body>
This is the XSLT:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output indent="yes"/>
<xsl:key name="pars" match="par[not(@def)]" use="generate-id(preceding-sibling::par[@def][1])" />
<xsl:template match="/*">
<html>
<body>
<table border="1">
<xsl:apply-templates />
</table>
</body>
</html>
</xsl:template>
<xsl:template match="item">
<tr>
<td><xsl:value-of select="@name"/></td>
<td>
<xsl:apply-templates select="richtext/par[@def]" />
</td>
</tr>
</xsl:template>
<xsl:template match="par[@def]">
<xsl:variable name="listType" select="preceding-sibling::*[1][self::pardef]/@list" />
<xsl:variable name="group" select="self::* | key('pars', generate-id())" />
<xsl:choose>
<xsl:when test="$listType = 'unordered'">
<ul>
<xsl:apply-templates select="$group" mode="list"/>
</ul>
</xsl:when>
<xsl:when test="$listType = 'ordered'">
<ol>
<xsl:apply-templates select="$group" mode="list"/>
</ol>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="$group" mode="para" />
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="par" mode="list">
<li>
<xsl:value-of select="run" separator=""/>
</li>
</xsl:template>
<xsl:template match="par" mode="para">
<p>
<xsl:value-of select="run" separator=""/>
</p>
</xsl:template>
</xsl:stylesheet>