1

Duplicate: Looking for C# HTML parser. Please close.

Can you recommend me a library for reading HTML files as XML in .NET? I'd actually prefer to deal with XML objects rather than text. Ideally, it must fix HTML formatting errors.

1
  • 1
    I know this. Otherwise I'd use regular XLINQ. Commented Jul 16, 2009 at 17:22

1 Answer 1

2

You may want to rethink this. The two are not equal.

a great example of this is self closing tags.

XML standard indicates that a self closing tag looks like the following:

<br/>

while html standards has non-content tags as single tags

<br>
<link rel="...">

In html, using the xml syntax actually is a violation, as /> has a different meaning.

There are more examples of these issues in the following article.

Sign up to request clarification or add additional context in comments.

1 Comment

That's precisely the point of the question - he wants a library that would read HTML, with all its quirks, and expose it as well-formed XHTML. So <br> gets translated to <br/>, implicitly-closed <p> becomes explicitly closed, etc.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.