0

I have an XSD file of the following format:

<?xml version="1.0" encoding="UTF-8"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <xsd:type name="type1">
        <xsd:example>
          <xsd:description>This is the description of said type1 tag</xsd:description>
        </xsd:example>
    </xsd:type>
    <xsd:type name="type2">
        <xsd:example>
          <xsd:description>This is the description of said type2 tag</xsd:description>
        </xsd:example>
    </xsd:type>
    <xsd:type name="type3">
        <xsd:example>
          <xsd:description>This is the description of said type3 tag</xsd:description>
        </xsd:example>
    </xsd:type>
</xsd:schema>

and the following XML file:

<theRoot>
    <type1>hi from type1</type1>
    <theChild>
        <type2>hi from type2</type2>
        <type3>hi from type3</type3>
    </theChild>
</theRoot>

I'd like to retrieve the value in between the xsd:description tag given that it is the child of the xsd:type tag with the name="type1" attribute. In other words, I'd like to retrieve "This is the description of said type1 tag".

I have tried to do this with lxml in the following way using Python:

from lxml import etree
XSDDoc = etree.parse(xsdFile)
root = XSDDoc.getroot()
result = root.findall(".//xsd:type/xsd:example/xsd:description[@name='type1']", root.nsmap)

I've used the same example and solution mentioned here. However, what I have done just returns empty results and I'm not able to retrieve the correct result.

For reference, my Python version is: Python 2.7.10

EDIT: When I use an example provided in the answer by retrieving the XML structure from a string, the result is as expected. However, when I try to retrieve from a file, I get empty lists returned (or None).

I am doing the following:

  • Retrieving the XML from a file
  • Including a variable to denote the name attribute (as it is dynamic)

The code loops over each node in a separate XML file, then checks in the XSD file to get each of the attributes as a result:

XMLDoc = etree.parse(open(xmlFile))

for Node in XMLDoc.xpath('//*'):
    nameVariable = os.path.basename(XMLDoc.getpath(Node))
    root = XSDDoc.getroot()
    description = XSDDoc.find(".//xsd:type[@name='{0}']/xsd:example/xsd:description".format(nameVariable), root.nsmap)

If I try to print out the result.text, I get:

AttributeError: 'NoneType' object has no attribute 'text'

7
  • What exactly have you tried? In the code in the question, you don't attempt to get the xsd:description element (which is the grandchild of xsd:type). Commented Nov 19, 2019 at 12:19
  • @mzjn sorry, as I've had to remove some sensitive information, I've left out the remaining path following xsd:type. I have edited the question to reflect my exact code. Commented Nov 19, 2019 at 12:25
  • That is not really the "exact" code (what is nameVariable?) Please provide a minimal reproducible example. Commented Nov 19, 2019 at 14:52
  • I have edited my question. nameVariable is simply a string. Commented Nov 19, 2019 at 14:55
  • Sorry to nag about this, but when I ask for a minimal reproducible example, I mean complete but minimal code (and XML) that I can copy, paste and run without changing anything. Commented Nov 19, 2019 at 14:59

1 Answer 1

1

The predicate ([@name='type1']) must be applied in the right place. The name attribute is on the xsd:type element. This should work:

result = root.findall(".//xsd:type[@name='type1']/xsd:example/xsd:description", root.nsmap)

# result is a list
for r in result:
    print(r.text)

In case you only want a single node, you can use find instead of findall. Complete example:

from lxml import etree

xsdFile = """
<root xmlns:xsd='http://whatever.com'>
 <xsd:type name="type1">
     <xsd:example>
       <xsd:description>This is the description of said type1 tag</xsd:description>
     </xsd:example>
 </xsd:type>
</root>"""

root = etree.fromstring(xsdFile)
result = root.find(".//xsd:type[@name='type1']/xsd:example/xsd:description", root.nsmap)

print(result.text)
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you for your answer. However, that piece of code returns an empty list, rather than anything containing the value within the tag.
Also, I believe the code would return a list object. How can I extract the value attribute from that list object?
Thank you for your help again. I have edited my question based off your answer. Please have a look when you can.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.