23

A simplified version of my XML parsing function is here:

import xml.etree.cElementTree as ET

def analyze(xml):
    it = ET.iterparse(file(xml))
    count = 0

    for (ev, el) in it:
        count += 1

    print('count: {0}'.format(count))

This causes Python to run out of memory, which doesn't make a whole lot of sense. The only thing I am actually storing is the count, an integer. Why is it doing this:

enter image description here

See that sudden drop in memory and CPU usage at the end? That's Python crashing spectacularly. At least it gives me a MemoryError (depending on what else I am doing in the loop, it gives me more random errors, like an IndexError) and a stack trace instead of a segfault. But why is it crashing?

6
  • 13
    stackoverflow.com/questions/1513592/… recommends calling .clear() on each element when you're done with it to save memory. Presumably this works because cElementTree keeps the previously-returned values in memory otherwise. Commented Oct 8, 2011 at 15:19
  • @Wooble You should post that as an answer. Nailed it. Commented Oct 8, 2011 at 15:27
  • 1
    @Oliver lxml beats ElementTree, but not cElementTree when it comes to parsing. Commented Oct 8, 2011 at 20:25
  • 1
    @Wooble: In all 3 ElementTree implementations, iterparse() builds the tree. It is up to the caller to delete unwanted elements. Commented Oct 8, 2011 at 20:33
  • 1
    Just a note: this issue seems to not affect the memory on my Mac at all, but causes my Ubuntu server to hemorrhage RAM like it's going out of style. Commented Jun 12, 2020 at 18:52

1 Answer 1

6

Code example:

import xml.etree.cElementTree as etree

def getelements(filename_or_file, tag):
    context = iter(etree.iterparse(filename_or_file, events=('start', 'end')))
    _, root = next(context) # get root element
    for event, elem in context:
        if event == 'end' and elem.tag == tag:
            yield elem
            root.clear() # preserve memory
Sign up to request clarification or add additional context in comments.

2 Comments

Shouldn't you invoke clear() on elem as well? Or are you certain that just clearing the root will cause the garbage collector to collect the element as well?
@hheimbuerger: root.clear() is enough. I haven't dig to deep but the memory usage was small when I used it to parse large xml files.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.