I am trying to build a blog-mapping site, which would spot the current trends in a given set of blogs. For convenience, I'm going to focus on a given list of Wordpress blogs.
Is there a Python package for parsing Wordpress HTML?
I'm looking for:
- Identification (Is the given HTML a Wordpress blog)
- Blog properties (name, posts, rss link, blogroll...)
- Post properties (title, text, tags...)
If there is no such package, I can implement it myself as an Open Source project, but an existing one would save me lots of time.