Python HTML parsing

Question

I am currently trying to make a program that given a word will look up its definition and return it. Although I have gotten this to work, I had to resort to using RegEx to search for the text between the tags where the definitions are stored. What is a more efficient way to do this using python 3.x?

Try searching first. stackoverflow.com/search?q=%5Bpython%5D+html+parse all of these questions are applicable to your problem. — S.Lott
– S.Lott, Commented Feb 4, 2011 at 11:13
possible duplicate of How to get the content of a Html page in Python — S.Lott
– S.Lott, Commented Feb 4, 2011 at 11:14

Lennart Regebro · Accepted Answer · 2011-02-04 08:46:34Z

5

lxml works for Python 3. It has an ElementTree compatible API, but is using c libraries behind the scenes, so it's fast, and it supports Xpaths, which is a nice way of parsing (sometimes).

answered Feb 4, 2011 at 8:46

Lennart Regebro

173k45 gold badges230 silver badges254 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

ocodo · Accepted Answer · 2011-02-04 06:16:47Z

4

Try BeautifulSoup a good HTML parser for Python. (works with Python 3.x too, although unless you are deep into a Python 3.0 project, consider using 2.7)

answered Feb 4, 2011 at 6:16

ocodo

30.5k19 gold badges111 silver badges127 bronze badges

1 Comment

Daren Thomas Over a year ago

yep, BeautifulSoup is the secret sauce!

Senthil Kumaran · Accepted Answer · 2011-02-04 06:27:43Z

2

Your's a pretty simple requirement when it comes to HTML parsing. Python standard library includes ElementTree module which should be helpful to do the task which you are planning to undertake. Look for the example snippet which is given in that page.

Also, never make the mistake of parsing HTML/XML using regex. You may not know when it will get insanely complicated and it is a bad idea under any situation too.

answered Feb 4, 2011 at 6:27

Senthil Kumaran

57.3k15 gold badges99 silver badges139 bronze badges

Collectives™ on Stack Overflow

Python HTML parsing

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related