1

I want to parse a large XML file (25 GB) in python, and change some of its elements.

I tried ElementTree from xml.etree but it takes too much time at the first step (ElementTree.parse).

I read somewhere that SAX is fast and do not load the entire file into the memory but it just for parsing not modifying.

'iterparse' should also be just for parsing not modifying.

Is there any other option which is fast and memory efficient?

3
  • Try lxml, it has the some options for that. Commented Apr 24, 2015 at 17:53
  • After you modify it, do you want to write it back to disk? Or do you to perform operations on the modified tree? Commented Apr 24, 2015 at 18:15
  • I want to find some elements of interest and change their attributes' values and write the file into the hard. Commented Apr 24, 2015 at 18:31

1 Answer 1

3

What is important for you here is that you need a streaming parser, which is what sax is. (There is a built in sax implementation in python and lxml provides one.) The problem is that since you are trying to modify the xml file, you will have to rewrite the xml file as you read it.

An XML file is a text file, You can't go and change some data in the middle of the text file without rewriting the entire text file (unless the data is the exact same size which is unlikely)

You can use SAX to read in each element and register an event to write back each element after it is been read and modified. If your changes are really simple it may be even faster to not even bother with the XML parsing and just match text for what you are looking for.

If you are doing any signinficant work with this large of an XML file, then I would say you shouldn't be using an XML file, you should be using a database.

The problem you have run into here is the same issue that Cobol programmers on mainframes had when they were working with File based data

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.