1

My python env is 2.7

I know this is an old question, but I've lost my mind while I was searching and reading other people's questions and answers. Some of them is really out of date. Like the code below:

import lxml #wrong
import xml #correct

So, since I'm a newbie to python and know nothing whatsoever in the great python history, I wanna make things more clear to me. Such as, what is the so-called standard xml-parser module in python now? what can I do when I need parse some HTML by using the xpath syntax. If I have a mal-formed HTML source code, how can handle it by not using BeautifulSoup or something else like. If u can brief me with something, I'll be much appreciated.

OK, all in all, I just got one question. How can I parse mal-formed html code by using standard python module with python2.7?

3
  • Is there any reason why you don't want to use BeautifulSoup? It's the canonical answer to parsing malformed HTML in Python really. Commented May 15, 2012 at 6:41
  • alright, I think I need to do more research Commented May 15, 2012 at 6:42
  • seems like beautifulsoup do not support xpath? Commented May 15, 2012 at 6:43

1 Answer 1

3

Read the python library documentation if you need to stick to the standard library.

If you don't, definitely look at lxml, which does much more.

Sign up to request clarification or add additional context in comments.

2 Comments

python library documentation, u know, it doesn't contains much example, unlike the php manual
HTMLParser contains two whole sections which contain the word "Example" in their titles!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.