Retrieve attribute names and values with Python / lxml and XPath

Question

I am using XPath with Python lxml (Python 2). I run through two passes on the data, one to select the records of interest, and one to extract values from the data. Here is a sample of the type of code.

from lxml import etree

xml = """
  <records>
    <row id="1" height="160" weight="80" />
    <row id="2" weight="70" />
    <row id="3" height="140" />
  </records>
"""

parsed = etree.fromstring(xml)
nodes = parsed.xpath('/records/row')
for node in nodes:
    print node.xpath("@id|@height|@weight")

When I run this script the output is:

['1', '160', '80']
['2', '70']
['3', '140']

As you can see from the result, where an attribute is missing, the position of the other attributes changes, so I cannot tell in row 2 and 3 whether this is the height or the weight.

Is there a way to get the names of the attributes returned from etree/lxml? Ideally, I should be looking at a result in the format:

[('@id', '1'), ('@height', '160'), ('@weight', '80')]

I recognise that I can solve this specific case using elementtree and Python. However, I wish to resolve this using XPaths (and relatively simple XPaths), rather than process the data using python.

Andersson · Accepted Answer · 2017-02-23 11:20:35Z

12

You should try following:

for node in nodes:
    print node.attrib

This will return dict of all attributes of node as {'id': '1', 'weight': '80', 'height': '160'}

If you want to get something like [('@id', '1'), ('@height', '160'), ('@weight', '80')]:

list_of_attributes = []
for node in nodes:
    attrs = []
    for att in node.attrib:
        attrs.append(("@" + att, node.attrib[att]))
    list_of_attributes.append(attrs)

Output:

[[('@id', '1'), ('@height', '160'), ('@weight', '80')], [('@id', '2'), ('@weight', '70')], [('@id', '3'), ('@height', '140')]]

edited Feb 23, 2017 at 11:20

answered Feb 23, 2017 at 10:57

Andersson

52.8k18 gold badges83 silver badges132 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Kevin Gill Over a year ago

Yes, that works, but it is Python. I want to use XPath to extract the data. Using XPath allows me to let users define the access paths. To implement in Python I will have to write some form of XPath DSL, which is pointless given that XPath is the DSL in this space.

Andersson Over a year ago

Does this do the trick /records/row/@*/concat(name(), ", ", .)?

Kevin Gill Over a year ago

Unfortunately not. This gives an error. print parsed.xpath('/records/row/@*/concat(name(), ", " .)') lxml.etree.XPathEvalError: Invalid expression

Kevin Gill · Accepted Answer · 2017-02-23 18:17:14Z

I was wrong in my assertion that I was not going to use Python. I found that the lxml/etree implementation is easily extended to that I can use the XPath DSL with modifications.

I registered the function "dictify". I changed the XPath expression to :

dictify('@id|@height|@weight|weight|height')

The new code is:

from lxml import etree

xml = """
<records>
    <row id="1" height="160" weight="80" />
    <row id="2" weight="70" ><height>150</height></row>
    <row id="3" height="140" />
</records>
"""

def dictify(context, names):
    node = context.context_node
    rv = []
    rv.append('__dictify_start_marker__')
    names = names.split('|')
    for n in names:
        if n.startswith('@'):
            val =  node.attrib.get(n[1:])
            if val != None:
                rv.append(n)
                rv.append(val)
        else:
            children = node.findall(n)
            for child_node in children:
                rv.append(n)
                rv.append(child_node.text)
    rv.append('__dictify_end_marker__')
    return rv

etree_functions = etree.FunctionNamespace(None)
etree_functions['dictify'] = dictify


parsed = etree.fromstring(xml)
nodes = parsed.xpath('/records/row')
for node in nodes:
    print node.xpath("dictify('@id|@height|@weight|weight|height')")

This produces the following output:

['__dictify_start_marker__', '@id', '1', '@height', '160', '@weight', '80', '__dictify_end_marker__']
['__dictify_start_marker__', '@id', '2', '@weight', '70', 'height', '150', '__dictify_end_marker__']
['__dictify_start_marker__', '@id', '3', '@height', '140', '__dictify_end_marker__']

Collectives™ on Stack Overflow

Retrieve attribute names and values with Python / lxml and XPath

2 Answers 2

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related