3

I am doing a very simple xslt to convert a html page to a xml file.

But it appears to me that the starting point is not that straightforward to me.My first goal is to convert a <html> tag into a <topic> tag.

I did the following xslt:

 <xsl:template match="@*|node()">
   <xsl:copy>
    <xsl:apply-templates select="@*|node()"/> 
  </xsl:copy>  
 </xsl:template>

 <xsl:template match="/">
   <xsl:apply-templates/>
 </xsl:template>

 <xsl:template match="html">
  <topic>
    <xsl:text> Conversion Test</xsl:text>
  </topic>
 </xsl:template>

However, now after I run this xslt, the result xml is purely of the same content of the original html page, it seems that the third template match that I wrote (to match the <html> tag) never gets hit.

The source html looks like:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
 <html xmlns="http://www.w3.org/1999/xhtml">
   <head>..</head>
   <body>...</body>
 </html>

Could experts help me a little here?

2
  • Can you give an example of your html? (especially any namespaces like xmlns="http://www.w3.org/1999/xhtml") Commented Oct 27, 2011 at 18:24
  • @DevNull, I updated my question with the source html Commented Oct 27, 2011 at 18:29

3 Answers 3

5

XSLT 1.0:

Try adding xmlns:x="http://www.w3.org/1999/xhtml" to your xsl:stylesheet and changing your match to match="x:html". (Note: you don't have to use "x"; you can choose anything you want.)

XSLT 2.0:

Either use the above method or replace the namespace prefix in your match(es) to "*" (match="*:html"). You could also add xpath-default-namespace="http://www.w3.org/1999/xhtml" to the xsl:stylesheet.

Sign up to request clarification or add additional context in comments.

3 Comments

thank you, it worked! Yes, the html is actually xhtml and I am using XSLT1.0, after put in your suggested namespace, worked great:)
I updated the title to reflect the nature of the source document too.
@Kevin - You're very welcome. Also, if you don't want the namespace in your XML output, add exclude-result-prefixes="#all" to xsl:stylesheet. (Note: you can replace #all with x to exclude x specifically.
0

You may want to try to remove the first template or make it more specific than matching every node with node().

3 Comments

Are you saying remove the identity transform?
@lkuty, I did try removing the first template. Now the resulting xml is simply a big node of text without any markup. It contains all the text from the original html page.
I was wrong. I thought the first rule could be chosen insted of the third but in fact the default priority for a match pattern with an element is greater than node() and thus it could not be the problem. I just didn't think about NS.
0

The purpose of XSLT is to transform XML documents into other XML documents. HTML is not a XML document. While XHTML is XML, it is actually HTML reformulated so I'm just not sure what you want to do is easy or possible with XSLT.

1 Comment

So NOW you update the title :). That reflects my problem with this.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.