1

I'm trying to use HTML data inside an the text node of an element, but it gets encoded as if it were meant to not be HTML data.

Here is an MWE:

from xml.etree import ElementTree as ET

data = '<a href="https://example.com">Example data gained from elsewhere.</a>'

p = ET.Element('p')
p.text = data
p = ET.tostring(p, encoding='utf-8', method='html').decode('utf8')
print(p)

The output is...

<p>&lt;a href="https://example.com"&gt;Example data gained from elsewhere.&lt;/a&gt;</p>

What I intended is...

<p><a href="https://example.com">Example data gained from elsewhere.</a></p>

2 Answers 2

2

What you are doing is wrong. You are assigning p.text = data, which basically considers the node to be text content. Its quite obvious the text is escaped. You have to add it as a child. like below:

from xml.etree import ElementTree as ET

data = '<a href="https://example.com">Example data gained from elsewhere.</a>'

d = ET.fromstring(data)
p = ET.Element('p')

p.append(d)
p = ET.tostring(p, encoding='utf-8', method='html').decode('utf8')
print(p)

Giving output

<p><a href="https://example.com">Example data gained from elsewhere.</a></p>
Sign up to request clarification or add additional context in comments.

Comments

1

You can parse the HTML string into an ElementTree object and append it to the DOM:

from xml.etree import ElementTree as ET

data = '<a href="https://example.com">Example data gained from elsewhere.</a>'

p = ET.Element('p')
p.append(ET.fromstring(data))
p = ET.tostring(p, encoding='utf-8', method='html').decode('utf8')
print(p)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.