Adding html tags to text of XML.ElementTree Elements in Python

Question

I am trying to use a python script to generate an HTML document with text from a data table using the XML.etree.ElementTree module. I would like to format some of the cells to include html tags, typically either   or  tags. When I generate a string and write it to a file, I believe the XML parser is converting these tags to individual characters. The output the shows the tags as text rather than processing them as tags. Here is a trivial example:

import xml.etree.ElementTree as ET

root = ET.Element('html')
   #extraneous code removed
td = ET.SubElement(tr, 'td')
td.text = 'This is the first line <br /> and the second'

tree = ET.tostring(root)
out = open('test.html', 'w+')           
out.write(tree)                     
out.close()

When you open the resulting 'test.html' file, it displays the text string exactly as typed: 'This is the first line and the second'.

The HTML document itself shows the problem in the source. It appears that the parser substitutes the "less than" and "greater than" symbols in the tag to the HTML representations of those symbols:

    <!--Extraneous code removed-->
<td>This is the first line %lt;br /&gt; and the second</td>

Clearly, my intent is to have the document process the tag itself, not display it as text. I'm not sure if there are different parser options I can pass to get this to work, or if there is a different method I should be using. I am open to using other modules (e.g. lxml) if that will solve the problem. I am mainly using the built-in XML module for convenience.

The only thing I've figured out that works is to modify the final string with re substitutions before I write the file:

tree = ET.tostring(root)
tree = re.sub(r'&lt;','<',tree)
tree = re.sub(r'&gt;','>',tree)

This works, but seems like it should be avoidable by using a different setting in xml. Any suggestions?

Anzel · Accepted Answer · 2014-11-02 00:27:56Z

6

You can use tail attribute with td and br to construct the text exactly what you want:

import xml.etree.ElementTree as ET


root = ET.Element('html')
table = ET.SubElement(root, 'table')
tr = ET.SubElement(table, 'tr')
td = ET.SubElement(tr, 'td')
td.text = "This is the first line "
# note how to end td tail
td.tail = None
br = ET.SubElement(td, 'br')
# now continue your text with br.tail
br.tail = " and the second"

tree = ET.tostring(root)
# see the string
tree
'<html><table><tr><td>This is the first line <br /> and the second</td></tr></table></html>'

with open('test.html', 'w+') as f:
    f.write(tree)

# and the output html file
cat test.html
<html><table><tr><td>This is the first line <br /> and the second</td></tr></table></html>

As a side note, to include the  and append text but still within <td>, use tail will have the desire effect too:

...
td.text = "this is first line "
sup = ET.SubElement(td, 'sup')
sup.text = "this is second"
# use tail to continue your text
sup.tail = "well and the last"

print ET.tostring(root)
<html><table><tr><td>this is first line <sup>this is second</sup>well and the last</td></tr></table></html>

edited Nov 2, 2014 at 0:27

answered Nov 2, 2014 at 0:10

Anzel

20.6k5 gold badges54 silver badges53 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Eric Dauenhauer Over a year ago

This worked perfectly! It definitely added a bit of code to my product, but made the end result much more predictable.

Collectives™ on Stack Overflow

Adding html tags to text of XML.ElementTree Elements in Python

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related