Getting empty list when accessing element and tag in xml file using ElementTree

Question

The idea is to get the value of tag endTime for the following xml:

<epochs xmlns="http://www.egi.com/epochs_mff" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <epoch>
    <beginTime>0</beginTime>
    <endTime>3586221000</endTime>
    <firstBlock>1</firstBlock>
    <lastBlock>897</lastBlock>
  </epoch>
  <epoch>
    <beginTime>3750143000</beginTime>
    <endTime>5549485000</endTime>
    <firstBlock>898</firstBlock>
    <lastBlock>1347</lastBlock>
  </epoch>
</epochs>

Yet, accessing the tag directly return an empty list:

import xml.etree.ElementTree as ET
tree = ET.parse(r'epochs.xml')
epoch_list=tree.findall("epoch")

However, looping through the tree does return the endTime value.

import xml.etree.ElementTree as ET
tree = ET.parse(r'epochs.xml')

for elem in tree:
    for subelem in elem:
        print(subelem.text)

May I know how can I retrieve directly the endTime with the value of 300937000?

Check your second code block. The third line doesn't seem to be complete — user1558604
– user1558604, Commented Jul 19, 2020 at 13:28
Dirty work around is to Parse XML Files Using Python’s BeautifulSoup using the line result = soup_page.find_all("endtime"). — rpb
– rpb, Commented Jul 19, 2020 at 14:10

Valdi_Bo · Accepted Answer · 2020-07-19 16:54:04Z

1

The reason your code failed is that your XML uses a default namespace (xmlns="http://...").

But your call to findall contains epoch without any namespace, so it is not likely to find anything.

To process namespaced XML, you have to:

create a dictionary of used namespaces ({prefix: namespace}),
include the prefix of the relevant namespace in the XPath expression,
pass the above dictionary as the second argument of findall.

Something like:

ns = {'ep': 'http://www.egi.com/epochs_mff'}
epoch_list = tree.findall('ep:epoch', ns)

Then the result is:

[<Element '{http://www.egi.com/epochs_mff}epoch' at 0x...>]

And to get the content your endTime element, if you don't care about any intermediate elements in the XML tree, run:

tree.findtext('.//ep:endTime', namespaces=ns)

Other choice is to pass full XML path, starting from the content of the root element, but remember about the namespace prefix at each step:

tree.findtext('ep:epoch/ep:endTime', namespaces=ns)

If you have multiple endTime elements, one of possible solutions is to process them in a loop.

This time findtext is useless as it finds only the first matching element. You should use a loop based on findall and then (within the loop) retrieve the text of the current element and make the intended use of it, e.g.:

for it in tree.findall('ep:epoch/ep:endTime', namespaces=ns):
    print(it.text)

Of course, replace print with whatever you need to consume the text found.

edited Jul 19, 2020 at 16:54

answered Jul 19, 2020 at 14:53

Valdi_Bo

31.1k4 gold badges29 silver badges45 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

rpb Over a year ago

Thanks for the detail explanation @Valdi_Bo. Just to extend the discussion further. How to loop the tree.findtext('ep:epoch/ep:endTime', namespaces=ns) if there exist more than two endTime instances?

Collectives™ on Stack Overflow

Getting empty list when accessing element and tag in xml file using ElementTree

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related