0

xml very newbie here.

I have an xml file, which is quite big with this form:

<a>
  <b>
    <id>1</id>
    ...
  </b>
  <b>
    <id>2</id>
    ...
  </b>
  <b>
    <id>3</id>
    ...
  </b>
  <b>
    <id>4</id>
    ...
  </b>
</a>

In b there is some information I want to retrieve and I am trying to follow a python help doc. I start with this:

#!/usr/bin/env python

import xml.etree.ElementTree as ET

tree = ET.parse('data.xml')
root = tree.getroot()

print 'root.tag = ', root.tag
print 'root.attrib = ', root.attrib

but because my file is to big, it takes several minutes just to do this part.

What I want to do is something like this:

for node in (n for n in nodes if n.id in ['1', '3']):
  print node.val1
  print node.val2

(without having to process all the nodes that don't match the id I want).

Is there a way of doing this?

2 Answers 2

1

ElementTree is a DOM-like parser, meaning it will first process the whole XML document and keep it in the memory, before you can navigate through the objects. This also means that you will have to wait until it is done before you can do that.

If your document is very large, you should look into SAX parsers which will only go through the document once but won’t store everything, making it very fast and memory efficient (but also more difficult to use).

You can also make use of ElementTree’s iterparse which will report information about elements it comes across similar to SAX parsers while it builds its internal structure. So you could read the information you want earlier and still have a complete ElementTree object in the end.

Sign up to request clarification or add additional context in comments.

Comments

0

What you have to is use something like "Using the target parser method" E.g High-performance XML parsing in Python with lxml

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.