Is it possible in Python to trace and filter functions that are called on strings during program run? I want to add sys.setdefaultencoding("utf-8") application, and I want to set some guards to predict potential problems with misusing standard functions (like len, for example), to process such strings.
1 Answer
You can replace the builtin:
import __builtin__
real_len = __builtin__.len
def checked_len(s):
... do extra checks ...
return real_len(s)
__builtin__.len = checked_len
1 Comment
anatoly techtonik
This works for
len, but not much for other functions. How to avoid the check for strings where its legal to use len, because I need len in bytes (such as calculating HTTP headers). There are variables and operations (not string content) for which this check should not fire. It is ok to set those filters manually.
utf-8instead ofascii?) that you are still determined to use it...sys.setdefaultencoding()? Use unicode internally, pass unicode to Jinja templates, and only convert to/fromutf-8at your system boundaries.unicode-naziin your tests, and run your test suite (assuming you've got decent test coverage). Every place whereunicode-nazicomplains about implicit conversions, you'll introduce a subtle, hard to find bug when you change the default encoding.unicode-naziis a good suggestion. The problem is that implicit conversions are made by Jinja2, and another templating layer (TAL) already works withutf-8byte strings. Need to investigate it further if TAL works with Unicode. There is also email backend that may break, so it is not that simple to just go Unicode. There are also user extensions that we can not check.