1

How to escape a string so that it can be used in an XML attribute?

I am not setting the attribute programmatically from python code — I just need to create a valid string which can be used as an XML attribute.

I have tried:

from xml.sax.saxutils import escape, quoteattr

print (escape('<&% "eggs and spam" &>'))
# >>> &lt;&amp;% "eggs and spam" &amp;&gt;

print (quoteattr('<&% "eggs and spam" &>'))
# >>> '&lt;&amp;% "eggs and spam" &amp;&gt;'

The problem is that both escape() and quoteattr() are not escaping the double quote character, i.e. ".

Of course, I can do a .replace('"', '&quot;') on the escaped string, but I am assuming there should be a way to do it with the existing API (from the standard library or with third-party modules such as lxml).

Update: I've found that Python3's html.escape produces the expected result but I am reluctant to use it in an XML context since I'm assuming that HTML escaping might follow a different spec than what is required by the XML standard (https://www.w3.org/TR/xml/#AVNormalize).

3
  • I don't understand the question. Are you building XML by string concatenation? Commented Jan 30, 2019 at 12:12
  • @Tomalak: yes, something like that; i need to process the 'raw' strings into a form that can be subsequently inserted into the XML in a non-programatic way. Commented Jan 30, 2019 at 14:26
  • Building XML this way it's not a great idea at all, but if you positively have to do it, I'd still recommend tapping into an XML API. Using ElementTree or lxml, I'd create the simplest XML document with a single element (e.g. <root>), add an attribute through the API, serialize the document to string, and finally remove the '<root attr="' and '" />' from the result. It would probably be 4 lines of code overall. Commented Jan 30, 2019 at 14:37

1 Answer 1

1

Shamelessly stolen from tornado (with a few modifications):

import re
_XHTML_ESCAPE_RE = re.compile('[&<>"\']')
_XHTML_ESCAPE_DICT = {'&': '&amp;', '<': '&lt;', '>': '&gt;', '"': '&quot;',
                      '\'': '&#39;'}

def xhtml_escape(value):
    """Escapes a string so it is valid within HTML or XML.

    Escapes the characters ``<``, ``>``, ``"``, ``'``, and ``&``.
    When used in attribute values the escaped strings must be enclosed
    in quotes.

    .. versionchanged:: 3.2

       Added the single quote to the list of escaped characters.
    """
    return _XHTML_ESCAPE_RE.sub(lambda match: _XHTML_ESCAPE_DICT[match.group(0)],
                                value)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.