1

I need to parse html emails that will be similar but not exactly the same. I will be looking for things like dates, amounts, vendors, ect., but depending on who the email came from, the markup will be different.

How could I parse out those common things from lots of different html markup in python?

Thanks for your suggestions.

1
  • 3
    Just don't use regular expressions :) Commented Feb 25, 2011 at 17:10

3 Answers 3

7

You absolutely need to consider Beautiful Soup library.

Sign up to request clarification or add additional context in comments.

2 Comments

Looks like a good way to parse the html. Will BeatifulSoup also cleanup/fix mal formed html?
@user634529. The answer is YES.
2

You can use Beautiful Soup to parse HTML in Python.

1 Comment

@downvoter: Are you trying to get a badge for downvoting everything or something? The link's not dead and you didn't leave a comment. I'm assuming the same person downvoted all 3 answers here.
2

BeautifulSoup or lxml are decent HTML parsers. BeautifulSoup is a bit more handy but has some odds and ends.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.