2

I need to get data from an XML and I'm using XPath, quite new to it, though I'm liking it.

I'm retrieving some nodes based on their attributes like this:

/cesAlign/linkGrp[@targType='s']

Now I'd like to get the value of another attribute in the node:

/cesAlign/linkGrp[@targType='s']/@fromDoc

However, this returns the first hit only. I'd like to return the attribute of all nodes containing targType ='s'

I was thinking of looping over the nodelist and then reading the attribute... something like this:

expr = xpath.compile("/cesAlign/linkGrp[@targType='s']/@fromDoc");
    NodeList nl = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);

    int i = 0;
    for (i = 0; i < nl.getLength(); i++) {
        expr = xpath.compile("/@fromDoc");
        System.out.println((String) expr.evaluate(nl, XPathConstants.STRING));
    }

But I'm not sure if there's a better and more elegant way to do this.

Here's a sample XML:

<cesAlign version="1.0">
 <linkGrp targType="s" toDoc="mt/C2004310.01029701.xml.gz" fromDoc="en/C2004310.01029701.xml.gz">
 <linkGrp targType="s" toDoc="mt/C2004310.01029702.xml.gz" fromDoc="en/C2004310.01029702.xml.gz">
</cesAlign>

Thanks!

2 Answers 2

1

I think you will have to iterate over found matches and fetch attribute value for each elements. Use "//cesAlign/linkGrp[@targType='s' and @fromDoc]" to select elements. Here is an elegant python solution:

#sample XML
xml = """
<cesAlign version="1.0">
 <linkGrp targType="s" toDoc="mt/C2004310.01029701.xml.gz" fromDoc="en/C2004310.01029701.xml.gz"/>
 <linkGrp targType="s" toDoc="mt/C2004310.01029702.xml.gz" fromDoc="en/C2004310.01029702.xml.gz"/>
 <linkGrp targType="s" toDoc="mt/C2004310.01029702.xml.gz" fromDoc="en/C2004310.01029703.xml.gz"/>
 <linkGrp targType="s" toDoc="mt/C2004310.01029702.xml.gz" fromDoc="en/C2004310.01029704.xml.gz"/>
 <linkGrp targType="s" toDoc="mt/C2004310.01029702.xml.gz" notFromDoc = "1"/>
 <linkGrp targType="s" toDoc="mt/C2004310.01029702.xml.gz" notFromDoc = "2"/>
</cesAlign>
"""
from lxml import etree
root = etree.fromstring(xml)
expr = root.xpath("//cesAlign/linkGrp[@targType='s' and @fromDoc]")
print "Matches:", len(expr)
for e in expr:
    print e.attrib["fromDoc"]

The output will be:

Matches: 4
en/C2004310.01029701.xml.gz
en/C2004310.01029702.xml.gz
en/C2004310.01029703.xml.gz
en/C2004310.01029704.xml.gz
Sign up to request clarification or add additional context in comments.

Comments

0

Alternatively, you can get each wanted attribute with a separate XPath expression:

/cesAlign/linkGrp[@targType='s'][$x]/@fromDoc 

where $x must be substituted with an integer in the interval:

[1, count(/cesAlign/linkGrp[@targType='s'])]

In case you have an XPath 2.0 engine available, the values of all wanted attributes can be obtained with a single XPath 2.0 expression:

/cesAlign/linkGrp[@targType='s']/@fromDoc/string(.)

when this XPath 2.0 expression is evaluated, the result is a sequence containing the string values of every wanted fromDoc attribute.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.