1

Is it possible in Python to trace and filter functions that are called on strings during program run? I want to add sys.setdefaultencoding("utf-8") application, and I want to set some guards to predict potential problems with misusing standard functions (like len, for example), to process such strings.

7
  • 1
    I guess that despite the advice you've recently received about the Dangers of sys.setdefaultencoding('utf-8') (also, Hack Jinja2 to encode from utf-8 instead of ascii?) that you are still determined to use it... Commented Apr 12, 2015 at 6:57
  • 1
    Isn't the fact that you already need to start implementing hacks like these indication enough for you that it's a bad idea to mess with sys.setdefaultencoding()? Use unicode internally, pass unicode to Jinja templates, and only convert to/from utf-8 at your system boundaries. Commented Apr 12, 2015 at 7:22
  • @PM2Ring I need to fix issues.roundup-tracker.org/issue2550811 ASAP, because it blocks next Roundup release and its development, and switching to using unicode internally looks like epic task that will take a lot of time and break existing template engine (TAL) and extensions. Commented Apr 12, 2015 at 12:38
  • 1
    @techtonik then fix it, and don't pretend to fix it by applying hacks on top of bandaids. How about this: Include unicode-nazi in your tests, and run your test suite (assuming you've got decent test coverage). Every place where unicode-nazi complains about implicit conversions, you'll introduce a subtle, hard to find bug when you change the default encoding. Commented Apr 12, 2015 at 19:11
  • @LukasGraf, unicode-nazi is a good suggestion. The problem is that implicit conversions are made by Jinja2, and another templating layer (TAL) already works with utf-8 byte strings. Need to investigate it further if TAL works with Unicode. There is also email backend that may break, so it is not that simple to just go Unicode. There are also user extensions that we can not check. Commented Apr 13, 2015 at 14:22

1 Answer 1

2

You can replace the builtin:

import __builtin__

real_len = __builtin__.len

def checked_len(s):
    ... do extra checks ...
    return real_len(s)

__builtin__.len = checked_len
Sign up to request clarification or add additional context in comments.

1 Comment

This works for len, but not much for other functions. How to avoid the check for strings where its legal to use len, because I need len in bytes (such as calculating HTTP headers). There are variables and operations (not string content) for which this check should not fire. It is ok to set those filters manually.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.