15

I need to do some very quick-n-dirty input sanitizing and I would like to basically convert all <, > to &lt;, &gt;.

I'd like to achieve the same results as '<script></script>'.replace('<', '&lt;').replace('>', '&gt;') without having to iterate the string multiple times. I know about maketrans in conjunction with str.translate (ie. http://www.tutorialspoint.com/python/string_translate.htm) but this only converts from 1 char to another char. In other words, one cannot do something like:

inList = '<>'
outList = ['&lt;', '&gt;']
transform = maketrans(inList, outList)

Is there a builtin function that can do this conversion in a single iteration?

I'd like to use builtin capabilities as opposed to external modules. I already know about Bleach.

3
  • Why not just iterate by hand? Commented Aug 17, 2015 at 16:09
  • In that case it seems you actually want to particularly encode characters in HTML, please check stackoverflow.com/questions/701704/… Commented Aug 17, 2015 at 16:11
  • See stackoverflow.com/questions/6116978/… for multiple string replacement in general. Commented Aug 17, 2015 at 16:16

3 Answers 3

15

Use html.escape() - cgi.escape() is deprecated in Python 3.

import html
input = '<>&'
output = html.escape(input)
print(output)

&lt;&gt;&amp;
Sign up to request clarification or add additional context in comments.

Comments

12

You can use cgi.escape()

import cgi
inlist = '<>'
transform = cgi.escape(inlist)
print transform

Output:

&lt;&gt;

https://docs.python.org/2/library/cgi.html#cgi.escape

cgi.escape(s[, quote]) Convert the characters '&', '<' and '>' in string s to HTML-safe sequences. Use this if you need to display text that might contain such characters in HTML. If the optional flag quote is true, the quotation mark character (") is also translated; this helps for inclusion in an HTML attribute value delimited by double quotes, as in . Note that single quotes are never translated.

1 Comment

As mentioned in other comments, this method is deprecated since Python 3.2 (docs.python.org/3.7/library/cgi.html#cgi.escape). It suggests using html.escape.
3

You can define your own function that loops over the string once and replaces any characters you define.

def sanitize(input_string):
    output_string = ''
    for i in input_string:
        if i == '>':
            outchar = '&gt;'
        elif i == '<':
            outchar = '&lt;'
        else:
            outchar = i
        output_string += outchar
    return output_string

Then calling

sanitize('<3 because I am > all of you')

yields

'&lt;3 because I am &gt; all of you'

3 Comments

do have a look at string.join and list comprehensions!
Using + with strings is quadratic because it constructs a new string every time. I think CPython can optimize this into a linear operation, but other implementations like PyPy may not be able to.
IMPORTANT: When rolling your own sanitzer, always use an explicit list. If any characters are NOT in the set of things you allow either a) raise an error or b) remove it or c) replace with a neutral character of some kind ... IE: else if i in set(string.ascii_letters + string.ascii_digits): ...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.