Python html parsing [duplicate]

Question

I need to parse html emails that will be similar but not exactly the same. I will be looking for things like dates, amounts, vendors, ect., but depending on who the email came from, the markup will be different.

How could I parse out those common things from lots of different html markup in python?

Thanks for your suggestions.

Just don't use regular expressions :)

Andrea Spadaccini
– Andrea Spadaccini

2011-02-25 17:10:25 +00:00
Commented Feb 25, 2011 at 17:10 — Andrea Spadaccini
– Andrea Spadaccini, Commented Feb 25, 2011 at 17:10

bioffe · Accepted Answer · 2011-02-25 16:58:04Z

7

You absolutely need to consider Beautiful Soup library.

answered Feb 25, 2011 at 16:58

bioffe

6,4833 gold badges53 silver badges68 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Sam Over a year ago

Looks like a good way to parse the html. Will BeatifulSoup also cleanup/fix mal formed html?

bioffe Over a year ago

@user634529. The answer is YES.

nmichaels · Accepted Answer · 2011-02-25 16:57:20Z

2

You can use Beautiful Soup to parse HTML in Python.

answered Feb 25, 2011 at 16:57

nmichaels

51.2k12 gold badges113 silver badges137 bronze badges

1 Comment

nmichaels Over a year ago

@downvoter: Are you trying to get a badge for downvoting everything or something? The link's not dead and you didn't leave a comment. I'm assuming the same person downvoted all 3 answers here.

user2665694 · Accepted Answer · 2011-02-25 16:59:04Z

2

BeautifulSoup or lxml are decent HTML parsers. BeautifulSoup is a bit more handy but has some odds and ends.

answered Feb 25, 2011 at 16:59

user2665694

Collectives™ on Stack Overflow

Python html parsing [duplicate]

3 Answers 3

2 Comments

1 Comment

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

1 Comment

Comments

Linked

Related