3

This is the XML document that I have:

<products xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <Product Id="1">
      <Product Id="1_1">
        <Attribute Name="Whatever"></Attribute>
      </Product>
      <Attributes xmlns="http://some/path/to/entity/def">
        <Attribute Name="Identifier">NumberOne</Attribute>
      </Attributes>
  </Product>
  <Product Id="2">
    <Attributes xmlns="http://some/path/to/entity/def">
      <Attribute Name="Identifier">NumberTwo</Attribute>
    </Attributes>
  </Product>
</products>

I'm trying to use XPath for getting a Product by its child Attributes.Attribute[Name=Identifier] value (e.g. "NumberOne"). So in that case my expected result would be:

<Product Id="1">
      <Product Id="1_1">
        <Attribute Name="Whatever"></Attribute>
      </Product>
      <Attributes xmlns="http://some/path/to/entity/def">
        <Attribute Name="Identifier">NumberOne</Attribute>
      </Attributes>
</Product>

Based on this explanation, I tried to implement the query in Python by using the lxml lib:

found_products = xml_tree_from_string.xpath('//products//Product[c:Attributes[Attribute[@Name="Identifier" and text()="NumberOne"]]]', namespaces={"c": "http://some/path/to/entity/def"})

Unfortunately, this never returns a result due to the Attributes namespace definition.

What am I missing?

2 Answers 2

2

What am I missing?

You're missing that Attribute is also in the same namespace as Attributes because default namespace declarations are inherited by descendent XML elements.

So, just add a c: to Attribute in your XPath, and it should work as you observed in your comment to Jack's answer.

Sign up to request clarification or add additional context in comments.

Comments

1

You need to first define a namespace map, declare a prefix for those namespaces that don't have one (as is the case here) and then apply xpath:

from lxml import etree
prods ="""[your xml above]"""
ns = { (k if k else "xx"):(v) for k, v in doc.xpath('//namespace::*') } #create ns map
doc = etree.XML(prods)
for product in doc.xpath('//products//Product[.//xx:Attribute[@Name="Identifier"][text()="NumberOne"]]', namespaces=ns):
    print(etree.tostring(product).decode())

Output:

<Product xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" Id="1">
      <Product Id="1_1">
        <Attribute Name="Whatever"/>
      </Product>
      <Attributes xmlns="http://some/path/to/entity/def">
        <Attribute Name="Identifier">NumberOne</Attribute>
      </Attributes>
  </Product>

To suppress the namespaces attributes, change the for loop to:

for product in doc.xpath('//products//Product[.//xx:Attribute[@Name="Identifier"][text()="NumberOne"]]', namespaces=ns):
    etree.cleanup_namespaces(doc) #note: the parameter is "doc", not "product"
    print(etree.tostring(product).decode())

Output:

<Product Id="1">
      <Product Id="1_1">
        <Attribute Name="Whatever"/>
      </Product>
      <Attributes xmlns="http://some/path/to/entity/def">
        <Attribute Name="Identifier">NumberOne</Attribute>
      </Attributes>
  </Product>

7 Comments

Thanks a lot. Is it possible to keep the top level Product definition as it was in the original XML file? I mean without the xmlns definitions. Fun fact - alternatively my initial approach works if I add the c: prefix to the Attribute as well.
@user2549803 Yes, it's possible. See edit.
@JackFleeting: Dynamically creating the namespace prefix map like this would be overkill in most situations, including this one, and requires a more sophisticated approach to account for the possibility of different default namespaces at different points in the XML hierarchy.
@kjhughes Absolutely right - except that when I go to the other extreme and suggest a purely local-name() based solution, I get whacked upside the head for taking a cavalier attitude to namespaces..
Oh no, wasn't suggesting that you defeat namespaces -- just that you use the namespace prefix mechanism directly without the partially general generation code you have. At least state its limitations so that future readers won't be surprised that it's not as general as it appears to be. Really, though, I'd just back off the generality and fix OP's XPath to include c: on the descendent elements of Attribute and call it done. Feel free to pull from my answer below and elaborate as needed.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.