Parse XHTML with Python 3.2

Question

I'm trying to parse a malformed XHTML page in Python. I just want to get a few tags of the same type from it, but it seems impossible. Normal XHTML parsers doesn't like the malformedness, and BeautifulSoup won't work because of syntax errors in its code. What would be the best way to parse malformed XHTML and get the content of a couple of tags of the same type?

Lennart Regebro · Accepted Answer · 2011-12-12 13:00:03Z

2

"Normal" parsers? lxml usually deals fine with malformed html, although it's quite "normal". :-)

answered Dec 12, 2011 at 13:00

Lennart Regebro

173k45 gold badges230 silver badges254 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

ukessi · Accepted Answer · 2011-12-12 10:46:57Z

0

You can try pyquery

I'm not sure how much malformed your XHTML is, but it's worth a try.

answered Dec 12, 2011 at 10:46

ukessi

1,3911 gold badge11 silver badges15 bronze badges

Comments

user1049697 · Accepted Answer · 2011-12-13 08:33:12Z

0

Thanks for the help! "Unfortunately" I solved it myself by using this parser and setting html.parser.HTMLParser(strict=False). That made it read malformed XHTML quite well.

answered Dec 13, 2011 at 8:33

user1049697

2,5095 gold badges30 silver badges36 bronze badges

1 Comment

Francesco Frassinelli Over a year ago

Keep in mind that strict=False is the default value, it's deprecated since Python 3.3 and it will be removed in Python 3.5.

Collectives™ on Stack Overflow

Parse XHTML with Python 3.2

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related