Trace functions that are called on Python strings

Question

Is it possible in Python to trace and filter functions that are called on strings during program run? I want to add sys.setdefaultencoding("utf-8") application, and I want to set some guards to predict potential problems with misusing standard functions (like len, for example), to process such strings.

I guess that despite the advice you've recently received about the Dangers of sys.setdefaultencoding('utf-8') (also, Hack Jinja2 to encode from utf-8 instead of ascii?) that you are still determined to use it... — PM 2Ring
– PM 2Ring, Commented Apr 12, 2015 at 6:57
Isn't the fact that you already need to start implementing hacks like these indication enough for you that it's a bad idea to mess with sys.setdefaultencoding()? Use unicode internally, pass unicode to Jinja templates, and only convert to/from utf-8 at your system boundaries. — Lukas Graf
– Lukas Graf, Commented Apr 12, 2015 at 7:22
@PM2Ring I need to fix issues.roundup-tracker.org/issue2550811 ASAP, because it blocks next Roundup release and its development, and switching to using unicode internally looks like epic task that will take a lot of time and break existing template engine (TAL) and extensions. — anatoly techtonik
– anatoly techtonik, Commented Apr 12, 2015 at 12:38
@techtonik then fix it, and don't pretend to fix it by applying hacks on top of bandaids. How about this: Include unicode-nazi in your tests, and run your test suite (assuming you've got decent test coverage). Every place where unicode-nazi complains about implicit conversions, you'll introduce a subtle, hard to find bug when you change the default encoding. — Lukas Graf
– Lukas Graf, Commented Apr 12, 2015 at 19:11
@LukasGraf, unicode-nazi is a good suggestion. The problem is that implicit conversions are made by Jinja2, and another templating layer (TAL) already works with utf-8 byte strings. Need to investigate it further if TAL works with Unicode. There is also email backend that may break, so it is not that simple to just go Unicode. There are also user extensions that we can not check. — anatoly techtonik
– anatoly techtonik, Commented Apr 13, 2015 at 14:22

R Samuel Klatchko · Accepted Answer · 2015-04-12 06:51:53Z

2

You can replace the builtin:

import __builtin__

real_len = __builtin__.len

def checked_len(s):
    ... do extra checks ...
    return real_len(s)

__builtin__.len = checked_len

answered Apr 12, 2015 at 6:51

R Samuel Klatchko

77k17 gold badges139 silver badges189 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

anatoly techtonik Over a year ago

This works for len, but not much for other functions. How to avoid the check for strings where its legal to use len, because I need len in bytes (such as calculating HTTP headers). There are variables and operations (not string content) for which this check should not fire. It is ok to set those filters manually.

Collectives™ on Stack Overflow

Trace functions that are called on Python strings

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related