I use an API to get some XML files but some of them contain HTML tags without escaping them. For example, <br> or <b></b>
I use this code to read them, but the files with the HTML raise an error. I don't have access to change manually all the files. Is there any way to parse the file without losing the HTML tags?
from xml.dom.minidom import parse, parseString
xml = ...#here is the api to receive the xml file
dom = parse(xml)
strings = dom.getElementsByTagName("string")
<br>with<br />before parsing the xml? And I don't see what's wrong with<b></b>? Also, consider using ElementTree instead of minidom; minidom can cause memory leaks.