Transforming xhtml using xslt - issue with displaying in web browser

Question

I try to transform XHTML webpage using XSLT by extracting some of its parts. For example, I'd like to extract HEAD and BODY parts separately (it's only first step, next will be extracting some divs) and use them in my output XHTML document. Here is XSLT code:

<xsl:stylesheet version="2.0"
  xmlns:xhtml="http://www.w3.org/1999/xhtml"
  xmlns="http://www.w3.org/1999/xhtml"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="xhtml xsl xs">

<xsl:output
  method="html"
  omit-xml-declaration="yes"
  doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN"
  doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
  indent="yes"/>


<xsl:template match="/">
  <HTML>
      <xsl:apply-templates/>
  </HTML>
</xsl:template>

<xsl:template match="xhtml:HTML/xhtml:BODY">
 <xsl:copy-of select="." disable-output-escaping="yes" />
</xsl:template>


<xsl:template match="xhtml:HTML/xhtml:HEAD">
  <xsl:copy-of select="." disable-output-escaping="yes"/>
</xsl:template>

</xsl:stylesheet>

As an input XHTML I have www.wordpress.org/about source code (validating). As first neko purifier is fired (HTML->XHTML) and then my xslt transformation. When I take a look into output code everything looks similar:

Original code: codepad.org/5D7MCXSk
Code after transformation: http://codepad.org/fGzyAwF2

Except, when I open it in web browser I get "white wall" - nothing appears. I noticed that in source code of transformed site (both on chrome and firefox) syntax is highlighted up to the closing HEAD tag. It is very weird and I thing that it is causing the problem.

Any help will be very appreciated. Thanks in advance

Well it is not clear what you want to achieve, your root element in the stylesheet has xmlns="http://www.w3.org/1999/xhtml" which suggests you want to output XHTML element. Your xsl:output also suggests you want to output an XHTML document. However XHTML is case-sensitive and all its elements and attributes are defined to be lower case so I don't understand then why you have a literal result element with name HTML. So using lower-case element and attribute names for any result elements is a first step to have a meaningful XHTML result document generated by your transformation. — Martin Honnen
– Martin Honnen, Commented Jan 31, 2011 at 16:37
(second comment as the first got too long). If the input is XHTML and you want to match on XHTML elements in your patterns then there you also need lower-case names e.g. match="xhtml:html/xhtml:head". If you still have problems then tell us two things, first of all whether you serve the transformation result as text/html or with an XML MIME type like application/xml or application/xml, and secondly, what result document you want to create from your input. — Martin Honnen
– Martin Honnen, Commented Jan 31, 2011 at 16:51
Are you performing the transformation client side or server side? What are your Content-Type headers? — user357812
– user357812, Commented Jan 31, 2011 at 16:51
I am sorry, maybe question was not 100% clear. What I am trying to achieve is to extract from input XHTML document some parts (let's say that it is div with id=main and div with id=bottom) along with all their sub-content and display it in output XHTML document. Everything using XSL transformation. It is transforming one XHTML into another. But I stucked at the very beginning - I could not move HEAD and BODY separately, and this is first point. Extracting other parts is second. Thanks! — omnomnom
– omnomnom, Commented Jan 31, 2011 at 18:43

Anthony Pegram · Accepted Answer · 2011-12-23 23:53:06Z

1

So it seems that http://codepad.org/5D7MCXSk (code 1) is the same as the source code of http://wordpress.org/about/ (code 2) and you process this code with "neko purifier" (is it this one: http://nekohtml.sourceforge.net/ ?) resulting the document in http://codepad.org/fGzyAwF2 (code 3). Correct me if I'm wrong.

The reason why code 3 doesn't show anything in the browser seems to be a self closing <SCRIPT/> at the end of the <HEAD>. YMMW, but in my tests for some reason the browsers didn't seem to like it.

Your XSLT code is slightly flawed but if you feed the code 3 as input, it produces an output. The quirk of the input file, that self closing script element, is preserved in the transformation.

Some random notes:

The original input (code 1) is well formed XML, so you don't need to "purify" it
<xsl:copy-of> doesn't have attribute disable-output-escaping
There is no sense in defining a default namespace for output document when using method="html" because html doesn't use namespaces (unlike xhtml)

edited Dec 23, 2011 at 23:53

Anthony Pegram

128k28 gold badges229 silver badges252 bronze badges

answered Jan 31, 2011 at 23:37

jasso

14k2 gold badges39 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

omnomnom Over a year ago

first I run neko purifier (the same as in you link), then I run XSLT transformation. I know that wordpress is valid XHTML site, but the whole mechanism will work also on different sites. This one is just a starting point. Thanks

omnomnom Over a year ago

you are right - the problem is self-closing <SCRIPT /> tag at the end of HEAD section. Thanks.

Collectives™ on Stack Overflow

Transforming xhtml using xslt - issue with displaying in web browser

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related