0

The following is a snippet of code from the extraction of an FODT file:

<office:document xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0" xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0" xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0"
xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0" xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0" xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0" xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0" xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0" xmlns:math="http://www.w3.org/1998/Math/MathML" xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0" xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0" xmlns:config="urn:oasis:names:tc:opendocument:xmlns:config:1.0" xmlns:ooo="http://openoffice.org/2004/office" xmlns:ooow="http://openoffice.org/2004/writer" xmlns:oooc="http://openoffice.org/2004/calc" xmlns:dom="http://www.w3.org/2001/xml-events" xmlns:xforms="http://www.w3.org/2002/xforms" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:rpt="http://openoffice.org/2005/report" xmlns:of="urn:oasis:names:tc:opendocument:xmlns:of:1.2" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:grddl="http://www.w3.org/2003/g/data-view#" xmlns:officeooo="http://openoffice.org/2009/office" xmlns:tableooo="http://openoffice.org/2009/table" xmlns:drawooo="http://openoffice.org/2010/draw" xmlns:calcext="urn:org:documentfoundation:names:experimental:calc:xmlns:calcext:1.0" xmlns:loext="urn:org:documentfoundation:names:experimental:office:xmlns:loext:1.0" xmlns:field="urn:openoffice:names:experimental:ooo-ms-interop:xmlns:field:1.0" xmlns:formx="urn:openoffice:names:experimental:ooxml-odf-interop:xmlns:form:1.0" xmlns:css3t="http://www.w3.org/TR/css3-text/" office:version="1.2" office:mimetype="application/vnd.oasis.opendocument.text">

I want to separate the content of each namespace. For example, I want to extract xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0", xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0", etc. including the namespace names themselves.

How do I do it using lxml?

2
  • What do you mean by "separate the content of each namespace"? Do you just want to list all declared namespaces? Commented Jun 1, 2016 at 18:15
  • 1
    @mzjn I'm sorry if I am not using the appropriate terminology. From the above code, I would like to have a list of this sort - [xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0", xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0", ...] Commented Jun 1, 2016 at 18:18

1 Answer 1

2

The nsmap property on the root element holds a dictionary with all declared namespaces. Example:

from lxml import etree

XML = "your XML document here..."

root = etree.fromstring(XML)
for ns in sorted(root.nsmap.items()):
    print ns

Output:

('calcext', 'urn:org:documentfoundation:names:experimental:calc:xmlns:calcext:1.0')
('chart', 'urn:oasis:names:tc:opendocument:xmlns:chart:1.0')
('config', 'urn:oasis:names:tc:opendocument:xmlns:config:1.0')
('css3t', 'http://www.w3.org/TR/css3-text/')
('dc', 'http://purl.org/dc/elements/1.1/')
('dom', 'http://www.w3.org/2001/xml-events')
('dr3d', 'urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0')
('draw', 'urn:oasis:names:tc:opendocument:xmlns:drawing:1.0')
('drawooo', 'http://openoffice.org/2010/draw')
('field', 'urn:openoffice:names:experimental:ooo-ms-interop:xmlns:field:1.0')
('fo', 'urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0')
('form', 'urn:oasis:names:tc:opendocument:xmlns:form:1.0')
('formx', 'urn:openoffice:names:experimental:ooxml-odf-interop:xmlns:form:1.0')
('grddl', 'http://www.w3.org/2003/g/data-view#')
('loext', 'urn:org:documentfoundation:names:experimental:office:xmlns:loext:1.0')
('math', 'http://www.w3.org/1998/Math/MathML')
('meta', 'urn:oasis:names:tc:opendocument:xmlns:meta:1.0')
('number', 'urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0')
('of', 'urn:oasis:names:tc:opendocument:xmlns:of:1.2')
('office', 'urn:oasis:names:tc:opendocument:xmlns:office:1.0')
('officeooo', 'http://openoffice.org/2009/office')
('ooo', 'http://openoffice.org/2004/office')
('oooc', 'http://openoffice.org/2004/calc')
('ooow', 'http://openoffice.org/2004/writer')
('rpt', 'http://openoffice.org/2005/report')
('script', 'urn:oasis:names:tc:opendocument:xmlns:script:1.0')
('style', 'urn:oasis:names:tc:opendocument:xmlns:style:1.0')
('svg', 'urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0')
('table', 'urn:oasis:names:tc:opendocument:xmlns:table:1.0')
('tableooo', 'http://openoffice.org/2009/table')
('text', 'urn:oasis:names:tc:opendocument:xmlns:text:1.0')
('xforms', 'http://www.w3.org/2002/xforms')
('xhtml', 'http://www.w3.org/1999/xhtml')
('xlink', 'http://www.w3.org/1999/xlink')
('xsd', 'http://www.w3.org/2001/XMLSchema')
('xsi', 'http://www.w3.org/2001/XMLSchema-instance')
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you so much! This was a noob-ish question, but it goes a long way in simplifying my future work :)
I have another question. If you have a look at the code snippet in my question, this is what you will see at the end - office:version="1.2" office:mimetype="application/vnd.oasis.opendocument.text" How do I extract this? Since they are not namespaces I cannot extract them the same way.
office:version="1.2" and office:mimetype="application/vnd.oasis.opendocument.text" are regular attributes (not namespace declarations). They are stored in the attrib property on the root element. Also available via the items() method. See lxml.de/api/lxml.etree._Element-class.html.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.