LXML using XPath and ETXPath in Python

Question

I apologise if this question could be easily answered by searching and reading the lxml documentation but I have tried to no avail.

I've been using lxml's findall quite frequently to query an XML file. Recently, I've needed to use wildcards in order to extract the data I need. This has led me to using Xpath.

I've managed to get this working with ETXPath but not Xpath. I'm confused as to why. An abstract of The XML file

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<DC xmlns="http://tradefinder.db.com/Schemas/MEL/MelHorizon_0_4_2.xsd">
<Header>
    <FileName>DBL_MPA_Gap_PRD_2017-06-01T07-50-52.xml</FileName>
    <ValidityDate>2017-05-31</ValidityDate>
    <Version>0.42</Version>
    <NoOfRecords>17228</NoOfRecords>
</Header>
<Overviews>
<OverviewLevelTimeStamp>
        <Identifier>Z 1 Index, TRADE</Identifier>
        <Level>2.2120000000000002</Level>
        <Timestamp>09:00:00.000</Timestamp>
</OverviewLevelTimeStamp>
</Overviews>
</DC>

And my python code used to extract the

findshiz = ETXPath("//" + namespace + "DC/" + namespace + "Overviews/" + namespace + "OverviewLevelTimeStamp[" + namespace + "Identifier= 'Z 1 Index, TRADE']")
required_nodes = findshiz(gap_xml)

Where "gap_xml" = the parsing of the file.

This code works. For some reason when I try and use xpath it doesn't. This involves me just renaming ETXPath with xpath. The reason why is because I need to use wildcards, so instead of "Z 1 Index, TRADE", it would be Z 1 Index*.

Thanks and let me know anyways to improve the question.

What is namespace? Please show the assignment line: namespace = ... — Parfait
– Parfait, Commented Jun 16, 2017 at 17:09
The difference between ETXPath and the "normal" xpath (using XPath internally) is that the former expects namespaces denoted as {http://...}tagname while the latter expects a prefix prefix:tagname and an additional namespace map: {'prefix': 'http://..'}. But otherwise both should do the same. (See also lxml.de/1.3/xpathxslt.html#etxpath) Can you provide your complete code for both versions? — Alfe
– Alfe, Commented Sep 17, 2018 at 12:00

salparadise · Accepted Answer · 2017-06-17 04:02:14Z

1

contains(., "Z 1 Index,") is like saying *Z1 Index*, which is a substring search.

Here is an example of using contains which is like a wildcard from xpath and map the namespace used:

       : import lxml.etree as etree

       : xstring = """
    ...: <DC xmlns="http://tradefinder.db.com/Schemas/MEL/MelHorizon_0_4_2.xsd">
    ...: <Header>
    ...:     <FileName>DBL_MPA_Gap_PRD_2017-06-01T07-50-52.xml</FileName>
    ...:     <ValidityDate>2017-05-31</ValidityDate>
    ...:     <Version>0.42</Version>
    ...:     <NoOfRecords>17228</NoOfRecords>
    ...: </Header>
    ...: <Overviews>
    ...: <OverviewLevelTimeStamp>
    ...:         <Identifier>Z 1 Index, TRADE</Identifier>
    ...:         <Level>2.2120000000000002</Level>
    ...:         <Timestamp>09:00:00.000</Timestamp>
    ...: </OverviewLevelTimeStamp>
    ...: </Overviews>
    ...: </DC>"""

 xstring = etree.fromstring(xstring)

 nsmap = {'ns': 'http://tradefinder.db.com/Schemas/MEL/MelHorizon_0_4_2.xsd'}

 print xstring.xpath('//ns:OverviewLevelTimeStamp[ns:Identifier[contains(., "Z 1 Index,")]]', namespaces=nsmap)

results in

[<Element {http://tradefinder.db.com/Schemas/MEL/MelHorizon_0_4_2.xsd}OverviewLevelTimeStamp at 0x10647aa70>]

Be aware that lxml xpath returns a list, so you have to extract the matching node from the list.

edited Jun 17, 2017 at 4:02

answered Jun 16, 2017 at 16:42

salparadise

5,8751 gold badge28 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

naiminp Over a year ago

Hi Sal, thanks for your answer. I can't really use 'contains' though as I need a wildcard for in between string searches. Also, I can't use 'tostring' here given the file is a very large xml.

salparadise Over a year ago

@naiminp tostring was just for my example, I am not telling you to use it. Also, contains is a wildcard, prepping an edit. .

Collectives™ on Stack Overflow

LXML using XPath and ETXPath in Python

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related