parse xhtml in python 2.6

Question

xml.etree.ElementTree.parse is choking on my xhtml file. I saw somewhere that lxml can handle html. Can someone tell me the documented way to parse, and then alter, xhtml? I want to add some javascript to xhtml on the fly.

What is ‘choking’? Is the document not well-formed XML? Is it using the HTML-specific entities that a non-DTD-reading parser will fail to resolve? — bobince
– bobince, Commented Feb 26, 2010 at 0:56
By 'choking' I mean that when I try to parse my xhtml file like this: html = myElementTree.parse(myXHTMLFile) The application throws the following exception: undefined entity : line 16, column 164 I've run into this before in other languages. The is a valid character in html, but not in xml, as you suggest. — Alex
– Alex, Commented Mar 2, 2010 at 23:15

user257111 · Accepted Answer · 2010-02-26 00:01:18Z

3

Have you tried BeautifulSoup? It handles documents that aren't well formed and I've found it pretty good.

answered Feb 26, 2010 at 0:01

user257111

Sign up to request clarification or add additional context in comments.

1 Comment

user257111 Over a year ago

Yes - I used in an extractor for data from an xhtml website and it seemed to manage fine. I'm not sure how easy it is to use BeautifulSoup to then edit the document as I've only ever been interested in extraction, but it will handle the extraction part.

Collectives™ on Stack Overflow

parse xhtml in python 2.6

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related