How to test if an XML node has a specific string using Element Tree

Question

I'm currently using Element Tree to parse some XML and some of it has multiple repeated name/value pairs that look like this. What i'm trying to do is extract the elements of interest i.e. gender = male and colour = red but I can't seem to do this using findall on it's own because of the structure. How do I extract these elements? I thought the correct logic would be to look for a child node where I can find child.text = 'gender' etc then go ahead and print out the name/values from that child node. What is the best way to do this?

<a:characteristic>
    <name>gender</name>
    <value>male</value>
</a:characteristic>
<a:characteristic>
    <name>age</name>
    <value>30</value>
</a:characteristic>
<a:characteristic>
    <name>colour</name>
    <value>red</value>
</a:characteristic>
<a:characteristic>
    <name>language</name>
    <value>python</value>
</a:characteristic>

alecxe · Accepted Answer · 2017-12-04 03:23:14Z

3

Instead of trying to deal with the XML document structure to make this kind of queries, I would make a more convenient data structure to make queries based on this kind of characteristics - a dictionary with a characteristic name as keys and characteristic values as values.

Something like:

import xml.etree.ElementTree as ET

data = """<root xmlns:a="http://www.w3.org/2002/07/a#">
    <a:characteristic>
        <name>gender</name>
        <value>male</value>
    </a:characteristic>
    <a:characteristic>
        <name>age</name>
        <value>30</value>
    </a:characteristic>
    <a:characteristic>
        <name>colour</name>
        <value>red</value>
    </a:characteristic>
    <a:characteristic>
        <name>language</name>
        <value>python</value>
    </a:characteristic>        
</root>"""

namespaces = {'a': 'http://www.w3.org/2002/07/a#'} 
root = ET.fromstring(data)
characteristics = {
    item.findtext("name"): item.findtext("value")
    for item in root.findall('a:characteristic', namespaces)
}
print(characteristics)

Prints:

{'gender': 'male', 'age': '30', 'colour': 'red', 'language': 'python'}

Now, getting, say, gender value is as easy as characteristics['gender'].

answered Dec 4, 2017 at 3:23

alecxe

476k127 gold badges1.1k silver badges1.2k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

kjhughes Over a year ago

Besides alecxe's helpful transformation of the data into an impressively more convenient general form, note also the proper use of a namespace declaration argument to findall(), which may well have frustrated any individual findall() attempts in OP's original code.

Collectives™ on Stack Overflow

How to test if an XML node has a specific string using Element Tree

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related