I have a project where I collect all the Wikipedia articles belonging to a particular category, pull out the dump from Wikipedia, and put it into our db.
So I should be parsing the Wikipedia dump file to get the stuff done. Do we have an efficient parser to do this job? I am a python developer. So I prefer any parser in python. If not suggest one and I will try to write a port of it in python and contribute it to the web, so other persons make use of it or at least try it.
So all I want is a python parser to parse Wikipedia dump files. I started writing a manual parser which parses each node and gets the stuff done.