Retrive subset of xml nodes with python

Question

xml very newbie here.

I have an xml file, which is quite big with this form:

<a>
  <b>
    <id>1</id>
    ...
  </b>
  <b>
    <id>2</id>
    ...
  </b>
  <b>
    <id>3</id>
    ...
  </b>
  <b>
    <id>4</id>
    ...
  </b>
</a>

In b there is some information I want to retrieve and I am trying to follow a python help doc. I start with this:

#!/usr/bin/env python

import xml.etree.ElementTree as ET

tree = ET.parse('data.xml')
root = tree.getroot()

print 'root.tag = ', root.tag
print 'root.attrib = ', root.attrib

but because my file is to big, it takes several minutes just to do this part.

What I want to do is something like this:

for node in (n for n in nodes if n.id in ['1', '3']):
  print node.val1
  print node.val2

(without having to process all the nodes that don't match the id I want).

Is there a way of doing this?

poke · Accepted Answer · 2013-06-07 16:16:55Z

1

ElementTree is a DOM-like parser, meaning it will first process the whole XML document and keep it in the memory, before you can navigate through the objects. This also means that you will have to wait until it is done before you can do that.

If your document is very large, you should look into SAX parsers which will only go through the document once but won’t store everything, making it very fast and memory efficient (but also more difficult to use).

You can also make use of ElementTree’s iterparse which will report information about elements it comes across similar to SAX parsers while it builds its internal structure. So you could read the information you want earlier and still have a complete ElementTree object in the end.

answered Jun 7, 2013 at 16:16

poke

392k80 gold badges596 silver badges632 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

hr_117 · Accepted Answer · 2013-06-07 17:22:51Z

0

What you have to is use something like "Using the target parser method" E.g High-performance XML parsing in Python with lxml

answered Jun 7, 2013 at 17:22

hr_117

9,6271 gold badge20 silver badges23 bronze badges

Collectives™ on Stack Overflow

Retrive subset of xml nodes with python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related