0

Please consider the following code:

import re

def qcharToUnicode(s):
    p = re.compile(r"QChar\((0x[a-fA-F0-9]*)\)")
    return p.sub(lambda m: '"' + chr(int(m.group(1),16)) + '"', s)

def fixSurrogatePresence(s) :
    '''Returns the input UTF-16 string with surrogate pairs replaced by the character they represent'''
    # ideas from:
    # http://www.unicode.org/faq/utf_bom.html#utf16-4
    # http://stackoverflow.com/a/6928284/1503120
    def joinSurrogates(match) :
        SURROGATE_OFFSET = 0x10000 - ( 0xD800 << 10 ) - 0xDC00
        return chr ( ( ord(match.group(1)) << 10 ) + ord(match.group(2)) + SURROGATE_OFFSET )
    return re.sub ( '([\uD800-\uDBFF])([\uDC00-\uDFFF])', joinSurrogates, s )

Now my questions below probably reflect a C/C++ way of thinking (and not a "Pythonic" one) but I'm curious nevertheless:

I'd like to know whether the evaluation of the compiled RE object p in qcharToUnicode and SURROGATE_OFFSET in joinSurrogates will take place at each call to the respective functions or only once at the point of definition? I mean in C/C++ one can declare the values as static const and the compile will (IIUC) make the construction occur only once, but in Python we do not have any such declarations.

The question is more pertinent in the case of the compiled RE object, since it seems that the only reason to construct such an object is to avoid the repeated compilation, as the Python RE HOWTO says:

Should you use these module-level functions, or should you get the pattern and call its methods yourself? If you’re accessing a regex within a loop, pre-compiling it will save a few function calls.

... and this purpose would be defeated if the compilation were to occur at each function call. I don't want to put the symbol p (or SURROGATE_OFFSET) at module level since I want to restrict its visibility to the relevant function only.

So does the interpreter do something like heuristically determine that the value pointed to by a particular symbol is constant (and visible within a particular function only) and hence need not be reconstructed at next function? Further, is this defined by the language or implementation-dependent? (I hope I'm not asking too much!)

A related question would be about the construction of the function object lambda m in qcharToUnicode -- is it also defined only once like other named function objects declared by def?

1
  • Even named functions defined by a def can be defined multiple times, if the entire def block is in a loop. In general Python makes very few assumptions about what will or will not change during the course of the program. Code is executed when it is encountered at runtime during program flow. Commented Jan 10, 2014 at 7:06

3 Answers 3

3

The simple answer is that as written, the code will be executed repeatedly at every function call. There is no implicit caching mechanism in Python for the case you describe.

You should get out of the habit of talking about "declarations". A function definition is in fact also "just" a normal statement, so I can write a loop which defines the same function repeatedly:

for i in range(10):
    def f(x):
        return x*2
    y = f(i)

Here, we will incur the cost of creating the function at every loop run. Timing reveals that this code runs in about 75% of the time of the previous code:

def f(x):
    return x*2

for i in range(10):
    y = f(i)

The standard way of optimising the RE case is as you already know to place the p variable in the module scope, i.e.:

p = re.compile(r"QChar\((0x[a-fA-F0-9]*)\)")

def qcharToUnicode(s):
    return p.sub(lambda m: '"' + chr(int(m.group(1),16)) + '"', s)

You can use conventions like prepending "_" to the variable to indicate it is not supposed to be used, but normally people won't use it if you haven't documented it. A trick to make the RE function-local is to use a consequence about default parameters: they are executed at the same time as the function definition, so you can do this:

def qcharToUnicode(s, p=re.compile(r"QChar\((0x[a-fA-F0-9]*)\)")):
    return p.sub(lambda m: '"' + chr(int(m.group(1),16)) + '"', s)

This will allow you the same optimisation but also a little more flexibility in your matching function.

Thinking properly about function definitions also allows you to stop thinking about lambda as different from def. The only difference is that def also binds the function object to a name - the underlying object created is the same.

Sign up to request clarification or add additional context in comments.

7 Comments

Your second code snippet doesn't work. When any of those f functions is called, the i in x*i is evaluated using the current value of i, not the value from the time the function was defined.
@user2357112 pending verification, but I believe he'd only have that bug in the javascript equivalent.
@stewSquared: Javascript and Python both use function scope rather than block scope, so the problem happens in both.
@user2357112 We're possibly speaking of different problems. In the javascript version, all the functions returned would be equivalent to "function(x) {return x*10}" whereas in the python version, they are indeed distinct functions.
The idea would be to use f inside the loop, so it doesn't have any practical issues. This would work even if you passed f to another function. Unless you are in the habit of changing the loop variable in a loop, this is still useful, although it is worth remembering that the behaviour of f could change if i changes later.
|
1

Python is a script/interpreted language... so yes, the assignment will be made every time you call the function. The interpreter will parse your code only once, generating Python bytecode. The next time you call this function, it will be already compiled into Python VM bytecode, so the function will be simply executed.

The re.compile will be called every time, as it would be in other languages. If you want to mimic a static initialization, consider using a global variable, this way it will be called only once. Better, you can create a class with static methods and static members (class and not instance members).

You can check all this using the dis module in Python. So, I just copied and pasted your code in a teste.py module.

>>> import teste
>>> import dis
>>> dis.dis(teste.qcharToUnicode)
  4           0 LOAD_GLOBAL              0 (re)
              3 LOAD_ATTR                1 (compile)
              6 LOAD_CONST               1 ('QChar\\((0x[a-fA-F0-9]*)\\)')
              9 CALL_FUNCTION            1
             12 STORE_FAST               1 (p)

  5          15 LOAD_FAST                1 (p)
             18 LOAD_ATTR                2 (sub)
             21 LOAD_CONST               2 (<code object <lambda> at 0056C140, file "teste.py", line 5>)
             24 MAKE_FUNCTION            0
             27 LOAD_FAST                0 (s)
             30 CALL_FUNCTION            2
             33 RETURN_VALUE

1 Comment

Um it says "don't use comment for thanks" but I feel somewhat lacking in etiquette to not say thanks for all the useful replies. I've upvoted them all and accepted one. Especially this one is useful because I did not know about dis.
1

Yes, they are. Suppose re.compile() had a side-effect. That side effect would happen everytime the assignment to p was made, ie., every time the function containing said assignment was called.

This can be verified:

def foo():
    print("ahahaha!")
    return bar

def f():
    return foo()
def funcWithSideEffect():
    print("The airspeed velocity of an unladen swallow (european) is...")
    return 25

def funcEnclosingAssignment():
    p = funcWithSideEffect()
    return p;

a = funcEnclosingAssignment()
b = funcEnclosingAssignment()
c = funcEnclosingAssignment()

Each time the enclosing function (analogous to your qcharToUnicode) is called, the statement is printed, revealing that p is being re-evaluated.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.