Saving XML files using ElementTree

Question

I'm trying to develop simple Python (3.2) code to read XML files, do some corrections and store them back. However, during the storage step ElementTree adds this namespace nomenclature. For example:

<ns0:trk>
  <ns0:name>ACTIVE LOG</ns0:name>
<ns0:trkseg>
<ns0:trkpt lat="38.5" lon="-120.2">
  <ns0:ele>6.385864</ns0:ele>
  <ns0:time>2011-12-10T17:46:30Z</ns0:time>
</ns0:trkpt>
<ns0:trkpt lat="40.7" lon="-120.95">
  <ns0:ele>5.905273</ns0:ele>
  <ns0:time>2011-12-10T17:46:51Z</ns0:time>
</ns0:trkpt>
<ns0:trkpt lat="43.252" lon="-126.453">
  <ns0:ele>7.347168</ns0:ele>
  <ns0:time>2011-12-10T17:52:28Z</ns0:time>
</ns0:trkpt>
</ns0:trkseg>
</ns0:trk>

The code snippet is below:

def parse_gpx_data(gpxdata, tzname=None, npoints=None, filter_window=None,
                   output_file_name=None):
        ET = load_xml_library();

    def find_trksegs_or_route(etree, ns):
        trksegs=etree.findall('.//'+ns+'trkseg')
        if trksegs:
            return trksegs, "trkpt"
        else: # try to display route if track is missing
            rte=etree.findall('.//'+ns+'rte')
            return rte, "rtept"

    # try GPX10 namespace first
    try:
        element = ET.XML(gpxdata)
    except ET.ParseError as v:
        row, column = v.position
        print ("error on row %d, column %d:%d" % row, column, v)

    print ("%s" % ET.tostring(element))
    trksegs,pttag=find_trksegs_or_route(element, GPX10)
    NS=GPX10
    if not trksegs: # try GPX11 namespace otherwise
        trksegs,pttag=find_trksegs_or_route(element, GPX11)
        NS=GPX11
    if not trksegs: # try without any namespace
        trksegs,pttag=find_trksegs_or_route(element, "")
        NS=""

    # Store the results if requested
    if output_file_name:
        ET.register_namespace('', GPX11)
        ET.register_namespace('', GPX10)
        ET.ElementTree(element).write(output_file_name, xml_declaration=True)

    return;

I have tried using the register_namespace, but with no positive result. Are there any specific changes for this version of ElementTree 1.3?

Tell me if I understood your question, you'd like to have <trk> instead of <ns0:trk> and so on? — Rik Poggi
– Rik Poggi, Commented Jan 24, 2012 at 8:40
Correct. I'd like to have <trk> instead of <ns0:trk> and so on. — ilya1725
– ilya1725, Commented Jan 24, 2012 at 16:27
This is not a real solution but since it seems that you load a string, have you tried to remove the namespace with some regexp? After that if you load and save without everything should be ok. — Rik Poggi
– Rik Poggi, Commented Jan 24, 2012 at 18:01
Hi Rik. I'll do it everything else fails. I'd like to configure ElementTree not to print it in the first place. — ilya1725
– ilya1725, Commented Jan 24, 2012 at 20:29

ilya1725 · Accepted Answer · 2012-01-25 06:35:03Z

103

In order to avoid the ns0 prefix the default namespace should be set before reading the XML data.

ET.register_namespace('', "http://www.topografix.com/GPX/1/1")
ET.register_namespace('', "http://www.topografix.com/GPX/1/0")

answered Jan 25, 2012 at 6:35

ilya1725

5,10910 gold badges48 silver badges75 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

likern Over a year ago

Looks like not before. I'm able to read XML file and get namespace and only after that set register_namespace. tree = ET.parse(str(udx_path)) root = tree.getroot() ns = { # extract namespace of root element 'udx': root.tag[1:root.tag.index('}')] } ET.register_namespace('', root.tag[1:root.tag.index('}')])

Emil Over a year ago

This is not complete way to preserve difference in parsed and output ElementTree string (if using ElementTree.tostring(root)). singingsingh is complete.

Instein Over a year ago

Just register before printing can be good enough.

Aananth C N Over a year ago

Singingsingh's explanation is more appropriate to the question.

FerozShahapur · Accepted Answer · 2020-08-26 18:55:38Z

55

You need to register all your namespaces before you parse xml file.

For example: If you have your input xml like this and Capabilities is the root of your Element tree.

<Capabilities xmlns="http://www.opengis.net/wmts/1.0"
    xmlns:ows="http://www.opengis.net/ows/1.1"
    xmlns:xlink="http://www.w3.org/1999/xlink"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:gml="http://www.opengis.net/gml"
    xsi:schemaLocation="http://www.opengis.net/wmts/1.0 http://schemas.opengis.net/wmts/1.0/wmtsGetCapabilities_response.xsd"
    version="1.0.0">

Then you have to register all the namespaces i.e attributes present with xmlns like this:

ET.register_namespace('', "http://www.opengis.net/wmts/1.0")
ET.register_namespace('ows', "http://www.opengis.net/ows/1.1")
ET.register_namespace('xlink', "http://www.w3.org/1999/xlink")
ET.register_namespace('xsi', "http://www.w3.org/2001/XMLSchema-instance")
ET.register_namespace('gml', "http://www.opengis.net/gml")

edited Aug 26, 2020 at 18:55

FerozShahapur

203 bronze badges

answered Jul 15, 2016 at 17:54

singingsingh

1,43215 silver badges16 bronze badges

1 Comment

gofvonx Over a year ago

This answer is the complete one.

Naiim Khaskhoussi · Accepted Answer · 2019-11-22 10:00:26Z

2

If you try to print the root, you will see something like this: http://www.host.domain/path/to/your/xml/namespace}RootTag' at 0x0000000000558DB8>

So, to avoid the ns0 prefix, you have to change the default namespace before parsing the XML data as below:

ET.register_namespace('', "http://www.host.domain/path/to/your/xml/namespace")

edited Nov 22, 2019 at 10:00

answered Nov 22, 2019 at 9:13

Naiim Khaskhoussi

1011 gold badge1 silver badge8 bronze badges

Comments

Rik Poggi · Accepted Answer · 2012-01-24 21:40:26Z

1

It seems that you have to declare your namespace, meaning that you need to change the first line of your xml from:

<ns0:trk>

to something like:

<ns0:trk xmlns:ns0="uri:">

Once did that you will no longer get ParseError: for unbound prefix: ..., and:

elem.tag = elem.tag[(len('{uri:}'):]

will remove the namespace.

answered Jan 24, 2012 at 21:40

Rik Poggi

29.5k7 gold badges69 silver badges84 bronze badges

3 Comments

ilya1725 Over a year ago

Hi Rik. The example XML I showed is the output. The input XML, which parses fine, doesn't have the 'ns0:' prefix. It is just standard GPX code.

Rik Poggi Over a year ago

If the line element = ET.XML(gpxdata) gives you an element with ns0 then the "problem" is in gpxdata, in which case you have to options: "fix" the gpxdata or find out why the standard parser does that and build a new one for ET.XML.

ilya1725 Over a year ago

The original gpxdata doesn't have any ns0 entries. However, your hint, Rik, kind of lead me to the solution. Basically, the ET.register_namespace('', GPX11) ET.register_namespace('', GPX10) should be done before reading, i.e. ET.XML.

Anon1284712 · Accepted Answer · 2022-11-30 13:56:49Z

1

Or you could regex it away:

def remove_xml_namespace(xml_str: str) -> str:
    xml_str = re.sub(r"<([^:]+):(\w+).+(?=xmlns)[^>]+>([\s\S]*)</(\1):(\2)>", r"\3", xml_str)
    # replace namespace elements from end tag
    xml_str = re.sub(r"</[^:]*:", r"</", xml_str)
    # replace namespace from start tags
    xml_str = re.sub(r"<[^/][^:]*:([^/>]*)(/?)>", r"<\1\2>", xml_str)
    return xml_str

answered Nov 30, 2022 at 13:56

Anon1284712

474 bronze badges

Collectives™ on Stack Overflow

Saving XML files using ElementTree

5 Answers 5

4 Comments

1 Comment

Comments

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

4 Comments

1 Comment

Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related