1

xml.etree.ElementTree.parse is choking on my xhtml file. I saw somewhere that lxml can handle html. Can someone tell me the documented way to parse, and then alter, xhtml? I want to add some javascript to xhtml on the fly.

2
  • 1
    What is ‘choking’? Is the document not well-formed XML? Is it using the HTML-specific entities that a non-DTD-reading parser will fail to resolve? Commented Feb 26, 2010 at 0:56
  • By 'choking' I mean that when I try to parse my xhtml file like this: html = myElementTree.parse(myXHTMLFile) The application throws the following exception: undefined entity  : line 16, column 164 I've run into this before in other languages. The   is a valid character in html, but not in xml, as you suggest. Commented Mar 2, 2010 at 23:15

1 Answer 1

3

Have you tried BeautifulSoup? It handles documents that aren't well formed and I've found it pretty good.

Sign up to request clarification or add additional context in comments.

1 Comment

Yes - I used in an extractor for data from an xhtml website and it seemed to manage fine. I'm not sure how easy it is to use BeautifulSoup to then edit the document as I've only ever been interested in extraction, but it will handle the extraction part.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.