How to parse HTML or XHTML or XML with python in a efficient way?

Question

My python env is 2.7

I know this is an old question, but I've lost my mind while I was searching and reading other people's questions and answers. Some of them is really out of date. Like the code below:

import lxml #wrong
import xml #correct

So, since I'm a newbie to python and know nothing whatsoever in the great python history, I wanna make things more clear to me. Such as, what is the so-called standard xml-parser module in python now? what can I do when I need parse some HTML by using the xpath syntax. If I have a mal-formed HTML source code, how can handle it by not using BeautifulSoup or something else like. If u can brief me with something, I'll be much appreciated.

OK, all in all, I just got one question. How can I parse mal-formed html code by using standard python module with python2.7?

Is there any reason why you don't want to use BeautifulSoup? It's the canonical answer to parsing malformed HTML in Python really. — Simeon Visser
– Simeon Visser, Commented May 15, 2012 at 6:41

Francis Avila · Accepted Answer · 2012-05-15 06:33:33Z

3

Read the python library documentation if you need to stick to the standard library.

If you don't, definitely look at lxml, which does much more.

answered May 15, 2012 at 6:33

Francis Avila

31.8k7 gold badges63 silver badges99 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

castiel Over a year ago

python library documentation, u know, it doesn't contains much example, unlike the php manual

Francis Avila Over a year ago

HTMLParser contains two whole sections which contain the word "Example" in their titles!

Collectives™ on Stack Overflow

How to parse HTML or XHTML or XML with python in a efficient way?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related