2

I have an HTML document stored in memory as an Linq-to-XML object tree. How can I serialize an XDocument as HTML, taking into account the idiosyncrasies of HTML?

For example, empty tags such as <br/> should be serialized as <br>, whereas an empty <div/> should be serialized as <div></div>.

HTML output is possible from an XSLT stylesheet, and XmlWriterSettings has an OutputMethod property which can be set to HTML - but the setter is internal, for use by XSLT or Visual Studio, and I can't seem to find a way to serialize arbitrary XML as HTML.

So, short of using XSLT solely for the HTML output capability (i.e. doing something like running the document through an otherwise pointless chain of XDocument->XmlReader->via XSLT, to HTML), is there a way to serialize a .NET XDocument to HTML?

7
  • Serialize to XHTML, problem solved. Is there any particular reason you want to serialize to old HTML? Also, <div /> should be equivalent to <div></div>... Commented Aug 25, 2009 at 12:32
  • 1
    @Matt: Perhaps a browser when accepting application/xhtml+xml would understand <div /> correctly otherwise it isn't. IE won't and doesn't understand application/xhthml+xml either. Commented Aug 25, 2009 at 12:36
  • What anthony says goes to the heart of the problem: Xml (really: just a valid tree) is great for processing, but HTTP support is poor, so I'd rather not use any form of xml when communicating with a browser just yet. It's not just browsers; using xml would mean a different content type, which impacts SEO, caching and proxies, so I'd rather avoid these potential pitfalls entirely and use plain old html. Commented Aug 25, 2009 at 12:47
  • I still stand by what I said originally. I firmly believe that XHTML is the way forward for HTML. HTML is a mess (as evidenced by the very need for this question). Do the right thing, XHTML, the proper content type and let the browsers worry about their broken implementations. That's the best you can do in any case. IE is just broken in general in my experience. Commented Aug 25, 2009 at 12:54
  • Oh, I like XHTML in principle (heck, in my short attempt at blogging I once wrote eamon.nerbonne.org/2006/12/why-xhtml-still-serves-purpose.html) - but I see it as a development tool rather than a realistic deployment option. I don't like IE6, but I'd rather IE6 users see an imperfect but functional layout than a "save as" dialog box. Commented Aug 25, 2009 at 13:41

3 Answers 3

2

No. The XDocument->XmlReader->XSLT is the approach you need.

What you are looking for is a specialised serialiser that arbitarily adds meaning to tag names like br and div and renders each differently. One would also expect such a serialiser to work in both directions, IOW be able to read HTML Tag soup and generate an XDocument. Such a thing does not exist out-of-the-box.

The XmlReader to XSLT seems simple enough for the job, ultimately is just a chain of streams.

Sign up to request clarification or add additional context in comments.

1 Comment

The infuriating thing is that there obviously is support in the box somewhere for serializing using html rules - after all, it works from Xslt, and there is that internal property. Also, using XSLT's html output also adds a (generally useless and incorrect) META tag to the document's head. I'll leave the question open a while longer, but if no one can come up with a better solution, I fear you're correct.
2

Like you, I'm really surprised that the HTML output method isn't exposed, and I don't know of any way round it, other than the XSLT route you've already identified. When I faced the same problem a couple of years ago, I wrote an XmlWriter wrapper class, that forced calls to WriteEndElement to use WriteFullEndElement on the underlying XmlWriter if the tag being processed wasn't in the list {"area", "base", "basefont", "bgsound", "br", "col", "embed", "frame", "hr", "isindex", "image", "img", "input", "link", "meta", "param", "spacer", "wbr" }.

This fixed the <div/> problem and was sufficient for me as what I wanted to write was polyglot documents. I didn't find a method to make <br/> appear as <br> but apart from not being able to validate as HTML 4.01 this doesn't cause a real problem. I guess that if you really need this, and don't want to use the XSLT method, you'll have to write your own XmlWriter implementation.

Comments

1

Of course there is!

//XDocument document; string filename;
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
typeof(XmlWriterSettings).GetField("outputMethod", BindingFlags.NonPublic|BindingFlags.Instance).SetValue(settings, XmlOutputMethod.Html);
using(XmlWriter xw = XmlWriter.Create(filename, settings))
{
    document.Save(xw);
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.