0

In my xml I have a CDATA section. I want to keep the CDATA part, and then strip it. Can someone help with the following?

Default does not work:

$ from io import StringIO
$ from lxml import etree
$ xml = '<Subject> My Subject: 美海軍研究船勘查台海水文? 船<![CDATA[&#xE9;]]>€ </Subject>'
$ tree = etree.parse(StringIO(xml))
$ tree.getroot().text
' My Subject: 美海軍研究船勘查台海水文? 船&#xE9;€ '

This post seems to suggest that a parser option strip_cdata=False may keep the cdata, but it has no effect:

$ parser=etree.XMLParser(strip_cdata=False)
$ tree = etree.parse(StringIO(xml), parser=parser)
$ tree.getroot().text    
' My Subject: 美海軍研究船勘查台海水文? 船&#xE9;€ '

Using strip_cdata=True, which should be the default, yields the same:

$ parser=etree.XMLParser(strip_cdata=True)
$ tree = etree.parse(StringIO(xml), parser=parser)    
$ tree.getroot().text    
' My Subject: 美海軍研究船勘查台海水文? 船&#xE9;€ '
3
  • 1
    If you add enough of the relevant XML, we might able to test. Commented Nov 23, 2018 at 23:19
  • Is that example not enough? I can add more. Commented Nov 23, 2018 at 23:27
  • 1
    Ah, sorry. It's hard to read, with those numbers before your actual code and data. If they are not an important part of your question, remove them. Commented Nov 23, 2018 at 23:28

1 Answer 1

3

CDATA sections are not preserved in the text property of an element, even if strip_cdata=False is used when the XML content is parsed, as you have noticed. See https://lxml.de/api.html#cdata.

CDATA sections are preserved in these cases:

  1. When serializing with tostring():

    print(etree.tostring(tree.getroot(), encoding="UTF-8").decode())
    
  2. When writing to a file:

    tree.write("subject.xml", encoding="UTF-8")
    
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for that. I read that part, but did not realise etree.tostring serialises.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.