0

I am attempting to demonstrate functionality for finding/replacing CDATA text string content within an XML, similar to the objective posed in a related question (Find and Replace CDATA Attribute Values in XML - Python). I am attempting to replace the string "Building in Éclépens, Switzerland" with a new string called "New Building" within a CDATA section of an XML, but I cannot seem to reference the first string correctly. Ideally, I want to be able to find/replace this string via indexing and not by having to hard-code the string name as a variable. The CDATA expression itself is correct and supports annotations, but I cannot even show how to reference this CDATA string even with a simple print statement. Below is the XML, along with the script I am using and the new string to be added to the desired output XML:

The XML ("foo_bar_CDATA.xml"):

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2">
<Overlay>
    <description>
    <![CDATA[
    <html>
    <head>
        <body>
            <div id="view">
                <div class="item">
                    <p><span style="font-weight:italic">Dataset:</span>
                        Building in Éclépens, Switzerland
                    </p>
                </div>
            </div>
        </body>
    </head>
    </html>
    ]]>
    </description>   
</Overlay></kml>

The script ("foo_bar_CDATA.xml"):

import lxml.etree as ET
xml = ET.parse("C:\\Users\\mdl518\\Desktop\\bar_foo_CDATA.xml")
tree=xml.getroot()

cd = ET.fromstring(tree.xpath('//*[local-name()="description"]')[0].text) # get CDATA out of the XML
print(cd[0][0][0][0][0][0].text) # prints "Dataset:" text contained within the 'span' element
val_1 = 'New Building'  # new string to be included in the XML  

# Find and replace the CDATA string with "val_1"
for elem in tree.getiterator():
    if elem.text:
        elem.text=elem.text.replace('Building in Éclépens, Switzerland ',val_1)
    
    output = ET.tostring(tree, 
                 encoding="UTF-8",
                 method="xml", 
                 xml_declaration=True, 
                 pretty_print=True)

    print(output.decode("utf-8"))

The Desired Output XML:

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2">
<Overlay>
    <description>
    <![CDATA[
    <html>
    <head>
        <body>
            <div id="view">
                <div class="item">
                    <p><span style="font-weight:italic">Dataset:</span>
                        New Building
                    </p>
                </div>
            </div>
        </body>
    </head>
    </html>
    ]]>
    </description>   
</Overlay></kml>

When I run the script above, I do not get the desired change to the string of interest and the open/close tags are not preserved (showing as &lt and &gt) in the printable view of the XML. I feel the correct solution may only required a couple minor tweaks, any assistance is most appreciated!

1
  • i was in rush ..I'll get that CDATA into it...will update in an hr Commented Feb 18, 2021 at 17:35

2 Answers 2

0

You have elem.text=elem.text.replace('Building in Éclépens, Switzerland ',val_1)

Instead use this elem.text=elem.text.replace('Building in Éclépens, Switzerland',val_1). I have removed space.

Sign up to request clarification or add additional context in comments.

Comments

0
import lxml.etree as ET
xml = ET.parse("/home/cam/out.xml")
tree=xml.getroot()

cd = ET.fromstring(tree.xpath('//*[local-name()="description"]')[0].text) # get CDATA out of the XML
#print(cd[0][0][0][0][0][0].text) # prints "Dataset:" text contained within the 'span' element

val_1 = 'New Building'  # new string to be included in the XML  

# Find and replace the CDATA string with "val_1"
for elem in tree.iter():
    if "description" in elem.tag:
        elem.text=elem.text.replace('Building in Éclépens, Switzerland',val_1)
        elem.text = '![CDATA[' + elem.text + ']]'
root_str = ET.tostring(tree)
root_str = str(root_str.decode('utf-8').replace('&lt;', '<').replace('&gt;', '>').replace('\\n', ''))
print(root_str)

Output:

<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2">
<Overlay>
    <description>![CDATA[
    
    <html>
    <head>
        <body>
            <div id="view">
                <div class="item">
                    <p><span style="font-weight:italic">Dataset:</span>
                        New Building
                    </p>
                </div>
            </div>
        </body>
    </head>
    </html>
    
    ]]</description>   
</Overlay></kml>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.