1

I am using python to do some conditional changes to an XML document. The incoming document has <?xml version="1.0" ?> at the top.

I'm using xml.etree.ElementTree.

How I'm parsing the changed XMl:

filter_update_body = ET.tostring(root, encoding="utf8", method="xml")

The output has this at the top:

<?xml version='1.0' encoding='utf8'?>

The client wants the "encoding" tag removed but if I remove it then it either doesn't include the line at all or it puts in encoding= 'us-ascii'

Can this be done so the output matches: <?xml version="1.0" ?>?

(I don't know why it matters honestly but that's what I was told needed to happen)

7
  • You could just write the XML to string, modify the string, and then write it out. outxml = outxml.replace("encoding='utf8'", "", 1) Commented Jan 26, 2023 at 14:47
  • I'm very new tp Python (we typically use JS and this is an edge case for us) So I did this: filter_update_body = ET.tostring(root, encoding="utf8", method="xml") filter_update_body = filter_update_body.replace("encoding='utf8'", "", 1) and I got an error: "TypeError: a bytes-like object is required, not 'str'" Commented Jan 26, 2023 at 15:00
  • That error isn't being caused by the string replacement. replace is a perfectly valid method of strings. Are you perhaps trying to write the file in binary mode? Commented Jan 26, 2023 at 15:09
  • 1
    Prefix both the strings in replace with b like this replace(b"encoding='utf-8'", b"", 1) Commented Jan 26, 2023 at 15:12
  • @Friedrich solution worked! Thank you! For my own knowledge, what does putting the b in there do? Commented Jan 26, 2023 at 15:15

1 Answer 1

1

As pointed out in this answer there is no way to make ElementTree omit the encoding attribute. However, as @James suggested in a comment, it can be stripped from the resulting output like this:

filter_update_body = ET.tostring(root, encoding="utf8", method="xml")
filter_update_body = filter_update_body.replace(b"encoding='utf8'", b"", 1)

The b prefixes are required because ET.tostring() will return a bytes object if encoding != "unicode". In turn, we need to call bytes.replace().

With encoding = "unicode" (note that this is the literal string "unicode"), it will return a regular str. In this case, the bs can be omitted. We use good old str.replace().

It's worth noting that the choice between bytes and str also affects how the XML will eventually be written to a file. A bytes object should be written in binary mode, a str in text mode.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.