9

I am using XPath with Python lxml (Python 2). I run through two passes on the data, one to select the records of interest, and one to extract values from the data. Here is a sample of the type of code.

from lxml import etree

xml = """
  <records>
    <row id="1" height="160" weight="80" />
    <row id="2" weight="70" />
    <row id="3" height="140" />
  </records>
"""

parsed = etree.fromstring(xml)
nodes = parsed.xpath('/records/row')
for node in nodes:
    print node.xpath("@id|@height|@weight")

When I run this script the output is:

['1', '160', '80']
['2', '70']
['3', '140']

As you can see from the result, where an attribute is missing, the position of the other attributes changes, so I cannot tell in row 2 and 3 whether this is the height or the weight.

Is there a way to get the names of the attributes returned from etree/lxml? Ideally, I should be looking at a result in the format:

[('@id', '1'), ('@height', '160'), ('@weight', '80')]

I recognise that I can solve this specific case using elementtree and Python. However, I wish to resolve this using XPaths (and relatively simple XPaths), rather than process the data using python.

2 Answers 2

12

You should try following:

for node in nodes:
    print node.attrib

This will return dict of all attributes of node as {'id': '1', 'weight': '80', 'height': '160'}

If you want to get something like [('@id', '1'), ('@height', '160'), ('@weight', '80')]:

list_of_attributes = []
for node in nodes:
    attrs = []
    for att in node.attrib:
        attrs.append(("@" + att, node.attrib[att]))
    list_of_attributes.append(attrs)

Output:

[[('@id', '1'), ('@height', '160'), ('@weight', '80')], [('@id', '2'), ('@weight', '70')], [('@id', '3'), ('@height', '140')]]
Sign up to request clarification or add additional context in comments.

3 Comments

Yes, that works, but it is Python. I want to use XPath to extract the data. Using XPath allows me to let users define the access paths. To implement in Python I will have to write some form of XPath DSL, which is pointless given that XPath is the DSL in this space.
Does this do the trick /records/row/@*/concat(name(), ", ", .)?
Unfortunately not. This gives an error. print parsed.xpath('/records/row/@*/concat(name(), ", " .)') lxml.etree.XPathEvalError: Invalid expression
2

I was wrong in my assertion that I was not going to use Python. I found that the lxml/etree implementation is easily extended to that I can use the XPath DSL with modifications.

I registered the function "dictify". I changed the XPath expression to :

dictify('@id|@height|@weight|weight|height')

The new code is:

from lxml import etree

xml = """
<records>
    <row id="1" height="160" weight="80" />
    <row id="2" weight="70" ><height>150</height></row>
    <row id="3" height="140" />
</records>
"""

def dictify(context, names):
    node = context.context_node
    rv = []
    rv.append('__dictify_start_marker__')
    names = names.split('|')
    for n in names:
        if n.startswith('@'):
            val =  node.attrib.get(n[1:])
            if val != None:
                rv.append(n)
                rv.append(val)
        else:
            children = node.findall(n)
            for child_node in children:
                rv.append(n)
                rv.append(child_node.text)
    rv.append('__dictify_end_marker__')
    return rv

etree_functions = etree.FunctionNamespace(None)
etree_functions['dictify'] = dictify


parsed = etree.fromstring(xml)
nodes = parsed.xpath('/records/row')
for node in nodes:
    print node.xpath("dictify('@id|@height|@weight|weight|height')")

This produces the following output:

['__dictify_start_marker__', '@id', '1', '@height', '160', '@weight', '80', '__dictify_end_marker__']
['__dictify_start_marker__', '@id', '2', '@weight', '70', 'height', '150', '__dictify_end_marker__']
['__dictify_start_marker__', '@id', '3', '@height', '140', '__dictify_end_marker__']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.