4

I have this xml file that has a lot of chemical groups and their properties. Here is a slice of the file:

 <groups>
  <group name='CH3'>
   <mw>15.03502</mw>
   <heatCapacity>
    <a>19.5</a>
   </heatCapacity>
  </group>
  <group name='CH2'>
   <mw>14.02708</mw>
   <heatCapacity>
    <a>-0.909</a>
   </heatCapacity>
  </group>
  <group name='COOH'>
   <mw>45.02</mw>
   <heatCapacity>
    <a>-24.1</a>
   </heatCapacity>
   </heatCapacity>
  </group>
  <group name='OH'>
   <mw>17.0073</mw>
   <heatCapacity>
    <a>25.7</a>
   </heatCapacity>
  </group>
<\groups>

In my python code that parses this file using ElementTree I have a list blocks=['CH3','CH2'] and I want to use this to find the two groups. I tried the following:

import elementtree.ElementTree as ET
document = ET.parse( 'groups.xml' )
blocks=['CH3','CH2']
for item in blocks:
   group1 = document.find(item)
   print group1

And all I get is 'None'. Can you please help me?

Many thanks

3
  • 2
    Perhaps it is worth to learn xpath... Commented Jul 29, 2014 at 16:06
  • in lxml you can just do doc.xpath("//group[starts-with(@name,'CH')]"), but I don't think elementtree has proper xpath support to handle that expression. Commented Jul 29, 2014 at 16:28
  • 1
    Is that your actual code? Because I'm used to seeing import xml.etree.ElementTree as ET as the import statement. Commented Jul 29, 2014 at 16:43

2 Answers 2

3

You can find an element's attributes via its .get() method. Here is one way to look there:

import xml.etree.ElementTree as ET
document = ET.parse( 'groups.xml' )
blocks=['CH3','CH2']
for group in document.getroot():
   if group.get('name') in blocks:
     print group

If you need access to the data through arbitrary selection criteria, you can create your own dictionary:

import xml.etree.ElementTree as ET

# Parse
document = ET.parse( 'groups.xml' )

# Add a dictionary so that <group>s
# are easy to find by name
groups = {}
for group in document.getroot():
   groups[group.get('name')] = group

# Look up our compounds in the dictionary
blocks=['CH3', 'CH2']
for item in blocks:
    group = groups[item]
    mw = group.find('mw').text
    print item, mw
Sign up to request clarification or add additional context in comments.

2 Comments

Hi Rob, thanks for your reply. It is essential for me to iterate on my list because I want to get groups in the correct order.
Use a dictionary to store the group data in a convenient fashion. See my recent edit.
2

Try this:

for block in blocks:
    group = document.find('./group[@name="{}"]'.format(block))
    if group:
        xml.etree.ElementTree.dump(group)
    else:
        print "Group {} not found.".format(group)

3 Comments

Hi Paulo thanks for your reply. I am using Python 2.4 which does not support this. How can I achieve this in 2.4?
replace './group[@name="{}"]'.format(block) by './group[@name="%s"]' % block
Sorry, I don't have 2.4 around in order to reproduce the problem. Update the question with the error message you got and I will be glad to help.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.