5

I have this XML file and I want to get the country nodes which have the pattern 'in' in their name.

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

I have tried this

    import xml.etree.ElementTree as ET
    tree = ET.parse('test.xml')
    root = tree.getroot()
    list=root.find(".//country[contains(@name, 'Pana')]")

But I am getting an error : SyntaxError: invalid predicate

Could anyone please help how to solve this?

3 Answers 3

3

xml.etree.ElementTree provides only limited support for XPath expressions for locating elements in a tree, and that doesn't include xpath contains() function. See the documentation for list of supported xpath syntax.

You need to resort to a library that provide better xpath support, like lxml, or use simpler xpath and do further filtering manually, for example :

import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
list = filter(lambda x: 'Pana' in x.get('name'), root.findall(".//country[@name]"))
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks. The solution you have provided is working. Is there a way to do a case-insensitive search also?
@AbhishekMoondra you can change the lambda to convert attribute value to lower before comparing : lambda x: 'pana' in x.get('name').lower(). That will result in case-insensitive search..
1

I cannot comment on why your original code does not work, but is has nothing to do with the XPath expression. The expression is fine, except for the leading . which you can safely omit.

Any reason you are not using the lxml xpath() method?

from lxml import etree
tree = etree.parse('etree.xml')
root = tree.getroot()
list = root.xpath("//country[contains(@name,'Pana')]")

print list[0].tag

gives back a country element:

$ python test.py
country

Comments

0

The xml parser you are using does not support contains. You will need to use a different parser for full xpath support

https://docs.python.org/2/library/xml.etree.elementtree.html#elementtree-xpath

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.