0

I would like to programmatically modify some XML files but I end up adding some modifications inadvertently. For example consider the following XML:

<?xml version="1.0" encoding="UTF-8"?>
<!-- A comment
-->
<abc:Tag xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:abc="http://www.mycompany.com" xmlns:def="http://www.anothercompany.com">
  <abc:sometext oneattribute="Hello" anotherattribute="World">
Some random boring text.
</abc:sometext>
  <def:somecode>
    <![CDATA[
if a>=b:
print(a)
]]>
  </def:somecode>
</abc:Tag>

I am trying to add a simple a comment in the code included in the CDATA section. To do so I am using the following python script that manages to handle the namespaces correctly and add the string. However, the CDATA is lost in the output:

import sys
from lxml import etree as ET

xml_file = sys.argv[1]
tree = ET.parse(xml_file)
root = tree.getroot()
ns = {}
element_tree = ET.iterparse(xml_file, events=["start-ns"])
try:
    for event, (prefix, qualified_name) in element_tree:
        ET.register_namespace(prefix, qualified_name)
        ns[prefix] = qualified_name
except ET.ParseError as err:
    sys.exit(1)


for somecode in tree.findall('def:somecode', namespaces=ns):
    somecode.text = somecode.text + "# updated with a comment"

tree.write('output.xml',
    xml_declaration=True,
    encoding="UTF-8")

The resulting output is different than the input in two ways I didn't expect and don't know how to correct:

  • Single quotes are replaced by double
  • The code in CDATA is printed as normal text

comparing input and output with meld to highlight the differences

2

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.