0

I have an XML file of(30GB) which contains 2 classes of data, The data of class 1 has corresponding

<id="11" class="1" bestmatchingid="50" Body="abc"> </id>
.
.
.
<id="9999890" class="2" MatchingClass1Id="11" Body="xyz"></id>

Now the task is to extract class1's body and corresponding class 2's body where e.g.

class1's id(11)== MatchingClass1Id of class2(which is 9999890)

I am accomplishing the same by using string comparison's in Python...is there a more efficient way in Python to accomplish the same considering my file size is 30 GB

2
  • Why have you tagged regex? XML Parsers are the best way to go... Commented Apr 5, 2012 at 9:53
  • I am attempting using regex, therefore tagged regex...but any efficient way would work for me. Commented Apr 5, 2012 at 9:54

2 Answers 2

4

Use LXML's iterparse function. See the IBM DeveloperWorks article about it for how to use it on very large files.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks a ton for helping. Can you please direct me to a tutorial which may help a beginner like me to catch up with the basics of lxml. Also in the IBMDeveloperWorks article...does not work for my code...since i have "id, class, Body, matchingclass1Id " and they guys are working with title...may be since i am novice i did not understand the subtilities...a big apology for disturbing again...but still despite ur helping i m stuck with trying to understand the basics
@user869790: what exactly is the problem, lack of Python knowledge or lack of XML knowledge?
-1

lxml works good for your purpose. Also since you are a begineer..so for understanding the basic refer to the tutorial:

http://infohost.nmt.edu/tcc/help/pubs/pylxml/web/etree-view.html

All iterparse method is an efficient method to solve your problem

1 Comment

(Completely OT: Hah. I was just to answer the CSV question when you deleted it. I'm surprised at how annoying that was. :-))

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.