Python Input Sanitization

Question

I need to do some very quick-n-dirty input sanitizing and I would like to basically convert all <, > to <, >.

I'd like to achieve the same results as '<script></script>'.replace('<', '<').replace('>', '>') without having to iterate the string multiple times. I know about maketrans in conjunction with str.translate (ie. http://www.tutorialspoint.com/python/string_translate.htm) but this only converts from 1 char to another char. In other words, one cannot do something like:

inList = '<>'
outList = ['&lt;', '&gt;']
transform = maketrans(inList, outList)

Is there a builtin function that can do this conversion in a single iteration?

I'd like to use builtin capabilities as opposed to external modules. I already know about Bleach.

In that case it seems you actually want to particularly encode characters in HTML, please check stackoverflow.com/questions/701704/… — Nicolas78
– Nicolas78, Commented Aug 17, 2015 at 16:11
See stackoverflow.com/questions/6116978/… for multiple string replacement in general. — augurar
– augurar, Commented Aug 17, 2015 at 16:16

tuomastik · Accepted Answer · 2024-09-18 10:54:01Z

15

Use html.escape() - cgi.escape() is deprecated in Python 3.

import html
input = '<>&'
output = html.escape(input)
print(output)

&lt;&gt;&amp;

edited Sep 18, 2024 at 10:54

tuomastik

4,9856 gold badges46 silver badges52 bronze badges

answered Oct 30, 2019 at 15:59

Michael Dubin

1782 silver badges6 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Joe Young · Accepted Answer · 2015-08-17 16:14:32Z

12

You can use cgi.escape()

import cgi
inlist = '<>'
transform = cgi.escape(inlist)
print transform

Output:

&lt;&gt;

https://docs.python.org/2/library/cgi.html#cgi.escape

cgi.escape(s[, quote]) Convert the characters '&', '<' and '>' in string s to HTML-safe sequences. Use this if you need to display text that might contain such characters in HTML. If the optional flag quote is true, the quotation mark character (") is also translated; this helps for inclusion in an HTML attribute value delimited by double quotes, as in . Note that single quotes are never translated.

answered Aug 17, 2015 at 16:14

Joe Young

5,9153 gold badges31 silver badges28 bronze badges

1 Comment

Cheche Over a year ago

As mentioned in other comments, this method is deprecated since Python 3.2 (docs.python.org/3.7/library/cgi.html#cgi.escape). It suggests using html.escape.

FTA · Accepted Answer · 2015-08-17 16:14:56Z

3

You can define your own function that loops over the string once and replaces any characters you define.

def sanitize(input_string):
    output_string = ''
    for i in input_string:
        if i == '>':
            outchar = '&gt;'
        elif i == '<':
            outchar = '&lt;'
        else:
            outchar = i
        output_string += outchar
    return output_string

Then calling

sanitize('<3 because I am > all of you')

yields

'&lt;3 because I am &gt; all of you'

answered Aug 17, 2015 at 16:14

FTA

3451 silver badge7 bronze badges

3 Comments

Nicolas78 Over a year ago

do have a look at string.join and list comprehensions!

Kevin Over a year ago

Using + with strings is quadratic because it constructs a new string every time. I think CPython can optimize this into a linear operation, but other implementations like PyPy may not be able to.

Erik Aronesty Over a year ago

IMPORTANT: When rolling your own sanitzer, always use an explicit list. If any characters are NOT in the set of things you allow either a) raise an error or b) remove it or c) replace with a neutral character of some kind ... IE: else if i in set(string.ascii_letters + string.ascii_digits): ...

Collectives™ on Stack Overflow

Python Input Sanitization

3 Answers 3

Comments

1 Comment

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related