0

These is the content of a file-like object toc:

<?xml version='1.0' encoding='utf-8'?>
<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" version="2005-1" xml:lang="eng">
    <head>
    ...
    </head>
    <docTitle>
        <text>THE_TEXT_I_WANT</text>
    </docTitle>

    ...
</ncx>

My Python3 codes now:

import xml.etree.ElementTree as ET

# I get toc using open method in zipfile module
# toc : <zipfile.ZipExtFile name='toc.ncx' mode='r' compress_type=deflate>
toc_tree = ET.parse(toc)
for node in toc_tree.iter():
    print(node)
print(toc_tree.find('docTitle'))

The for loop can print out all nodes but find method returns None. findall method returns nothing either. Please anybody tell me why? Is there any better solution?

1 Answer 1

1

Because there is a (default) namespace in your XML, searching for elements called docTitle will find nothing, as it is searching for un-namespaced elements called docTitle. Instead, you need to use clark notation with the full namespace URI:

toc_tree.find('{http://www.daisy.org/z3986/2005/ncx/}docTitle')
Sign up to request clarification or add additional context in comments.

5 Comments

It works! Thanks. Can I use namespace= in find method?
How come I replace 'docTitle' with 'text', find method returns nothing again??
because find and findall only search direct children by default. try using find('.//{http://www.daisy.org/z3986/2005/ncx/}text') as per docs.python.org/3/library/…
Thank you for your answers!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.