I am using XPath with Python lxml (Python 2). I run through two passes on the data, one to select the records of interest, and one to extract values from the data. Here is a sample of the type of code.
from lxml import etree
xml = """
<records>
<row id="1" height="160" weight="80" />
<row id="2" weight="70" />
<row id="3" height="140" />
</records>
"""
parsed = etree.fromstring(xml)
nodes = parsed.xpath('/records/row')
for node in nodes:
print node.xpath("@id|@height|@weight")
When I run this script the output is:
['1', '160', '80']
['2', '70']
['3', '140']
As you can see from the result, where an attribute is missing, the position of the other attributes changes, so I cannot tell in row 2 and 3 whether this is the height or the weight.
Is there a way to get the names of the attributes returned from etree/lxml? Ideally, I should be looking at a result in the format:
[('@id', '1'), ('@height', '160'), ('@weight', '80')]
I recognise that I can solve this specific case using elementtree and Python. However, I wish to resolve this using XPaths (and relatively simple XPaths), rather than process the data using python.