2

I am trying to build a blog-mapping site, which would spot the current trends in a given set of blogs. For convenience, I'm going to focus on a given list of Wordpress blogs.

Is there a Python package for parsing Wordpress HTML?

I'm looking for:

  • Identification (Is the given HTML a Wordpress blog)
  • Blog properties (name, posts, rss link, blogroll...)
  • Post properties (title, text, tags...)

If there is no such package, I can implement it myself as an Open Source project, but an existing one would save me lots of time.

2
  • 1
    If you write your own, look at beautiful soup Commented Apr 7, 2011 at 21:13
  • I suggest lxml as more modern Commented Apr 7, 2011 at 22:41

1 Answer 1

3

As I know, there are not any parsing libraries that parse WordPress HTML specially, but general-purpose HTML parsing libraries such as html5lib, BeautifulSoup.

I recommend you html5lib+lxml.html.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.