0

I am trying to analyze xml data, and encountered an issue with regard to HTML entities when I use

import xml.etree.ElementTree as ET
tree = ET.parse(my_xml_file)
root = tree.getroot()
for regex_rule in root.findall('.//regex_rule'):
  print(regex_rule.get('input')) #this ".get()" method turns &lt; into <, but I want to get &lt; as written
  print(regex_rule.get('input') == "(?&lt;!\S)hello(?!\S)") #prints out false because ElementTree's get method turns &lt; into < , is that right?

And here is the xml file contents:

<rules>
<regex_rule input="(?&lt;!\S)hello(?!\S)" output="world"/>
</rules>

I would appreciate if anybody can direct me to getting the string as is from the xml attribute for the input, without converting

&lt; 

into

<

1 Answer 1

2

xml.etree.ElementTree is doing exactly the standards-compliant thing, which is to decode XML character entities with the understanding that they do in fact encode the referenced character and should be interpreted as such.

The preferred course of action if you do need to encode the literal &lt; is to change your input file to use &amp;lt; instead (i.e. we XML-encode the &).

If you can't change your input file format then you'll probably need to use a different module, or write your own parser: xml.etree.ElementTree translates entities well before you can do anything meaningful with the output.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for your input. It seems that I am out of luck, using xml.etree.ElementTree. I will resort to some kind of other creative solutions. (I am in an environment where I can't easily install other modules like lxml, etc). I am basically checking rules that exist in xml and json files. In json files, there is no html entity, and there shouldn't be. I will accept your response as the answer. Thank you.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.