1

I'm currently using Element Tree to parse some XML and some of it has multiple repeated name/value pairs that look like this. What i'm trying to do is extract the elements of interest i.e. gender = male and colour = red but I can't seem to do this using findall on it's own because of the structure. How do I extract these elements? I thought the correct logic would be to look for a child node where I can find child.text = 'gender' etc then go ahead and print out the name/values from that child node. What is the best way to do this?

<a:characteristic>
    <name>gender</name>
    <value>male</value>
</a:characteristic>
<a:characteristic>
    <name>age</name>
    <value>30</value>
</a:characteristic>
<a:characteristic>
    <name>colour</name>
    <value>red</value>
</a:characteristic>
<a:characteristic>
    <name>language</name>
    <value>python</value>
</a:characteristic>         

1 Answer 1

3

Instead of trying to deal with the XML document structure to make this kind of queries, I would make a more convenient data structure to make queries based on this kind of characteristics - a dictionary with a characteristic name as keys and characteristic values as values.

Something like:

import xml.etree.ElementTree as ET

data = """<root xmlns:a="http://www.w3.org/2002/07/a#">
    <a:characteristic>
        <name>gender</name>
        <value>male</value>
    </a:characteristic>
    <a:characteristic>
        <name>age</name>
        <value>30</value>
    </a:characteristic>
    <a:characteristic>
        <name>colour</name>
        <value>red</value>
    </a:characteristic>
    <a:characteristic>
        <name>language</name>
        <value>python</value>
    </a:characteristic>        
</root>"""

namespaces = {'a': 'http://www.w3.org/2002/07/a#'} 
root = ET.fromstring(data)
characteristics = {
    item.findtext("name"): item.findtext("value")
    for item in root.findall('a:characteristic', namespaces)
}
print(characteristics)

Prints:

{'gender': 'male', 'age': '30', 'colour': 'red', 'language': 'python'}

Now, getting, say, gender value is as easy as characteristics['gender'].

Sign up to request clarification or add additional context in comments.

1 Comment

Besides alecxe's helpful transformation of the data into an impressively more convenient general form, note also the proper use of a namespace declaration argument to findall(), which may well have frustrated any individual findall() attempts in OP's original code.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.