parsing and modifying an xml file with CDATA sections

I would like to programmatically modify some XML files but I end up adding some modifications inadvertently. For example consider the following XML:

<?xml version="1.0" encoding="UTF-8"?>
<!-- A comment
-->
<abc:Tag xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:abc="http://www.mycompany.com" xmlns:def="http://www.anothercompany.com">
  <abc:sometext oneattribute="Hello" anotherattribute="World">
Some random boring text.
</abc:sometext>
  <def:somecode>
    <![CDATA[
if a>=b:
print(a)
]]>
  </def:somecode>
</abc:Tag>

I am trying to add a simple a comment in the code included in the CDATA section. To do so I am using the following python script that manages to handle the namespaces correctly and add the string. However, the CDATA is lost in the output:

import sys
from lxml import etree as ET

xml_file = sys.argv[1]
tree = ET.parse(xml_file)
root = tree.getroot()
ns = {}
element_tree = ET.iterparse(xml_file, events=["start-ns"])
try:
    for event, (prefix, qualified_name) in element_tree:
        ET.register_namespace(prefix, qualified_name)
        ns[prefix] = qualified_name
except ET.ParseError as err:
    sys.exit(1)


for somecode in tree.findall('def:somecode', namespaces=ns):
    somecode.text = somecode.text + "# updated with a comment"

tree.write('output.xml',
    xml_declaration=True,
    encoding="UTF-8")

The resulting output is different than the input in two ways I didn't expect and don't know how to correct:

Single quotes are replaced by double
The code in CDATA is printed as normal text

edited Feb 4, 2021 at 9:14

mzjn

51.5k16 gold badges139 silver badges265 bronze badges

asked Feb 4, 2021 at 8:41

Diamantis Sellis

451 silver badge4 bronze badges

1

Try the strip_cdata=False parser option. lxml.de/api/lxml.etree.iterparse-class.html

mzjn
– mzjn

2021-02-04 09:16:39 +00:00
Commented Feb 4, 2021 at 9:16
See also stackoverflow.com/a/44561547/407651, stackoverflow.com/a/53455951/407651

mzjn
– mzjn

2021-02-04 09:18:32 +00:00
Commented Feb 4, 2021 at 9:18

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

parsing and modifying an xml file with CDATA sections

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked