2

Given an input XML document like this:

<?xml version="1.0" encoding="utf-8"?>
<title> This contains an 'embedded' HTML document </title>
<document>
<html>
<head><title>HTML DOC</title></head>
<body>
Hello World
</body>
</html>
</document>
</root>

How I can extract that 'inner' HTML document; render it as CDATA and include in my output document ?

So the output document will be an HTML document; which contains a text-box showing the elements as text (so it will be displaying the 'source-view' of the inner document).

I have tried this:

<xsl:template match="document">
<xsl:value-of select="*"/>
</xsl:template>

But this only renders the Text Nodes.

I have tried this:

<xsl:template match="document">
<![CDATA[
<xsl:value-of select="*"/>
]]>
</xsl:template>

But this escapes the actual XSLT and I get:

&lt;xsl:value-of select="*"/&gt;

I have tried this:

<xsl:output method="xml" indent="yes" cdata-section-elements="document"/>
[...]
<xsl:template match="document">
<document>
<xsl:value-of select="*"/>
</document>
</xsl:template>

This does insert a CDATA section, but the output still contains just text (stripped elements):

<?xml version="1.0" encoding="UTF-8"?>
<html>
   <head>
      <title>My doc</title>
   </head>
   <body>
      <h1>Title: This contains an 'embedded' HTML document </h1>
      <document><![CDATA[
                                                HTML DOC

                                                                Hello World

                                ]]></document>
   </body>
</html>
1
  • Can you show your expected output please? Commented Sep 12, 2012 at 14:04

1 Answer 1

11

There are two confusions you need to clear up here.

First, you probably want xsl:copy-of rather than xsl:value-of. The latter returns the string value of an element, the former returns a copy of the element.

Second, the cdata-section-elements attribute on xsl:output affects the serialization of text nodes, but not of elements and attributes. One way to get what you want would be to serialize the HTML yourself, along the lines of the following (not tested):

<xsl:template match="document/descendant::*">
  <xsl:value-of select="concat('&lt;', name())"/>
  <!--* attributes are left as an exercise for the reader ... *-->
  <xsl:text>&gt;</xsl:text>
  <xsl:apply-templates/>
  <xsl:value-of select="concat('&lt;/', name(), '>')"/>
</xsl:template>

But the quicker way would be something like the following solution (squeamish readers, stop reading now), pointed out to me by my friend Tommie Usdin. Drop the cdata-section-elements attribute from xsl:output and replace your template for the document element with:

<xsl:template match="document">
  <document>
    <xsl:text disable-output-escaping="yes">&lt;![CDATA[</xsl:text>
    <xsl:copy-of select="./html"/>
    <xsl:text disable-output-escaping="yes">]]&gt;</xsl:text>
  </document>
</xsl:template> 
Sign up to request clarification or add additional context in comments.

4 Comments

Nice one - I was starting to realized that I needed to code the &lt; &gt; here - but hadn't seen the disable-output-escaping option before. In fact I have resorted to using different characters (French Quotes!) to represent angle-brackets now ! Thanks for the additional tips re: value-of / copy-of as well. cheers
thanks for that. definitely not for the squeamish readers. or should i say, those into elegance. hhhheee
For some reasons Saxon 9 insists on escaping the &lt;![CDATA anyhow. At least this is what it does in my XSLT: github.com/gioele/rng-doc/blob/…
Leaving this note for anyone who runs into this: If you use XmlDocument.CreateNavigator().AppendChild to create the XmlWriter (in C#). the resulting CDATA section will not be escaped. i.e. it will result in &lt;![CDATA instead of <![CDATA

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.