Remove "encoding" attribute from XML in Python

Question

I am using python to do some conditional changes to an XML document. The incoming document has <?xml version="1.0" ?> at the top.

I'm using xml.etree.ElementTree.

How I'm parsing the changed XMl:

filter_update_body = ET.tostring(root, encoding="utf8", method="xml")

The output has this at the top:

<?xml version='1.0' encoding='utf8'?>

The client wants the "encoding" tag removed but if I remove it then it either doesn't include the line at all or it puts in encoding= 'us-ascii'

Can this be done so the output matches: <?xml version="1.0" ?>?

(I don't know why it matters honestly but that's what I was told needed to happen)

You could just write the XML to string, modify the string, and then write it out. outxml = outxml.replace("encoding='utf8'", "", 1) — James
– James, Commented Jan 26, 2023 at 14:47
I'm very new tp Python (we typically use JS and this is an edge case for us) So I did this: filter_update_body = ET.tostring(root, encoding="utf8", method="xml") filter_update_body = filter_update_body.replace("encoding='utf8'", "", 1) and I got an error: "TypeError: a bytes-like object is required, not 'str'" — Bryant Richards
– Bryant Richards, Commented Jan 26, 2023 at 15:00
That error isn't being caused by the string replacement. replace is a perfectly valid method of strings. Are you perhaps trying to write the file in binary mode? — kindall
– kindall, Commented Jan 26, 2023 at 15:09
Prefix both the strings in replace with b like this replace(b"encoding='utf-8'", b"", 1) — Friedrich
– Friedrich, Commented Jan 26, 2023 at 15:12
@Friedrich solution worked! Thank you! For my own knowledge, what does putting the b in there do? — Bryant Richards
– Bryant Richards, Commented Jan 26, 2023 at 15:15

Friedrich · Accepted Answer · 2023-01-26 15:47:54Z

1

As pointed out in this answer there is no way to make ElementTree omit the encoding attribute. However, as @James suggested in a comment, it can be stripped from the resulting output like this:

filter_update_body = ET.tostring(root, encoding="utf8", method="xml")
filter_update_body = filter_update_body.replace(b"encoding='utf8'", b"", 1)

The b prefixes are required because ET.tostring() will return a bytes object if encoding != "unicode". In turn, we need to call bytes.replace().

With encoding = "unicode" (note that this is the literal string "unicode"), it will return a regular str. In this case, the bs can be omitted. We use good old str.replace().

It's worth noting that the choice between bytes and str also affects how the XML will eventually be written to a file. A bytes object should be written in binary mode, a str in text mode.

answered Jan 26, 2023 at 15:47

Friedrich

5,45816 gold badges83 silver badges63 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Remove "encoding" attribute from XML in Python

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related