8

I'm relatively new to Python, and I keep seeing examples like:

def max_wordnum(texts):
    count = 0
    for text in texts:
        if len(text.split()) > count:
            count = len(text.split())
    return count

Is the repeated len(text.split()) somehow optimized away by the interpreter/compiler in Python, or will this just take twice the CPU cycles of storing len(text.split()) in a variable?

2
  • 1
    No, it's not. The call will be done twice. So, there's room for optimizing the code. Commented Aug 11, 2018 at 19:35
  • 2
    Good example would write the function like this: max(len(text.split()) for text in texts) Commented Aug 11, 2018 at 20:00

2 Answers 2

6

Duplicate expressions are not "somehow optimized away". Use a local variable to capture and re-use a result that is 'known not to change' and 'takes some not-insignificant time' to create; or where using a variable increases clarity.

In this case, it's impossible for Python to know that 'text.split()' is pure - a pure function is one with no side-effects and always returns the same value for the given input.

Trivially: Python, being a dynamically-typed language, doesn't even know the type of 'text' before it actually gets a value, so generalized optimization of this kind is not possible. (Some classes may provide their own internal 'cache optimizations', but digressing..)

As: even a language like C#, with static typing, won't/can't optimize away general method calls - as, again, there is no basic enforceable guarantee of purity in C#. (ie. What if the method returned a different value on the second call or wrote to the console?)

But: a Haskell, a Purely Functional language, has the option to not 'evaluate' the call twice, being a different language with different rules...

Sign up to request clarification or add additional context in comments.

Comments

3

Even if python did optimize this (which isn't the case), the code is copy/paste all over and more difficult to maintain, so creating a variable to hold the result of a complex computation is always a good idea.

A better idea yet is to use max with a key function in this case:

return max(len(text.split()) for text in texts)

this is also faster.

Also note that len(text.split()) creates a list and you just count the items. A better way would be to count the spaces (if words are separated by only one space) by doing

return max(text.count(" ") for text in texts) + 1

if there can be more than 1 space, use regex and finditer to avoid creating lists:

return max(sum(1 for _ in re.finditer("\s+",text)) for text in texts) + 1

note the 1 value added in the end to correct the value (number of separators is one less than the number of words)

As an aside, even if the value isn't cached, you still can use complex expressions in loops with range:

for i in range(len(text.split())):

the range object is created at the start, and the expression is only evaluated once (as opposed as C loops for instance)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.