xml.etree.ElementTree.parse is choking on my xhtml file. I saw somewhere that lxml can handle html. Can someone tell me the documented way to parse, and then alter, xhtml? I want to add some javascript to xhtml on the fly.
-
1What is ‘choking’? Is the document not well-formed XML? Is it using the HTML-specific entities that a non-DTD-reading parser will fail to resolve?bobince– bobince2010-02-26 00:56:49 +00:00Commented Feb 26, 2010 at 0:56
-
By 'choking' I mean that when I try to parse my xhtml file like this: html = myElementTree.parse(myXHTMLFile) The application throws the following exception: undefined entity : line 16, column 164 I've run into this before in other languages. The is a valid character in html, but not in xml, as you suggest.Alex– Alex2010-03-02 23:15:12 +00:00Commented Mar 2, 2010 at 23:15
Add a comment
|
1 Answer
Have you tried BeautifulSoup? It handles documents that aren't well formed and I've found it pretty good.