6

I want to optimize.

Simple solution

connection = get_db_connection()
for item in my_iterator:
    push_item_to_db(item, connection)

Drawback:

get_db_connection() is slow. If my_iterator is empty, then I want to avoid to call it.

"if None" solution

connection = None
for item in my_iterator:
    if connection is None:
        connection = get_db_connection()
    push_item_to_db(item, connection)

Drawback:

If there are 100k items in my_iterator, then if connection is None gets called 100k times (although it is needed only once). I want to avoid this.

Perfect solution ...

  1. don't call get_db_connection() if iterator is empty
  2. don't call if connection is None: uselessly for every iteration.

Any idea?

5
  • 4
    This is massive over-optimization. if not i is an insignificant overhead compared to whatever will happen in push_item_to_db. Commented Apr 4, 2016 at 10:12
  • If get_db_connection is slow, "optimizing" to avoid an if statement doesn't seem the right thing to do... That said, your iterator ought to throw a StopIteration that terminates the for each loop when it is empty. Commented Apr 4, 2016 at 10:13
  • @DanielRoseman yes, this is "massive over-optimization". But nevertheless I like this question, since I have no clue how to solve it. For me it is more fun then a question which really "hurts" me. Commented Apr 4, 2016 at 10:36
  • Why do you need to enumerate? in your snippet you don't use it. Why not just interate over my_iterator? Inside for loop, connect only if not yet connected by checking value of connection Commented Apr 4, 2016 at 11:12
  • @joelgoldstick yes, you are right. I changed enumerate() to "if None". Commented Apr 4, 2016 at 11:42

4 Answers 4

6

You can do something like:

connection = None
for item in my_iterator:
    if connection is None:
        connection = get_db_connection()
    push_item_to_db(item, connection)

Simple solution. Don't need to overthink it. Even with 100k operations, x is None is just a reference comparison taking one Python opcode. You really don't need to optimise this compared to a full tcp roundtrip + disk write that happens on every insert.

Sign up to request clarification or add additional context in comments.

8 Comments

Yes, your "if" is faster than mine. But I would like to avoid it nevertheless.
Then you'd have to go with @coredump's answer. That's pretty much the only way to avoid explicit if. Just keep in mind it's still not guaranteed to be an optimisation. It can actually be slower and I'd definitely prefer to review the a simpler solution if I was looking at your code.
you think there is no other solution?
What kind of solution are you looking for? You've got the loop over some collection and you need to start a db connection on the first element - there are many ways to write it, but it's essentially always going to be the same thing.
I search a solution does fit the above text: Perfect solution ... 1: don't call get_db_connection() if iterator is empty, 2: don't call "if connection is None:" uselessly for every iteration.
|
2
for item in my_iterator:
    # First item (if any)
    connection = get_db_connection()
    push_item_to_db(item, connection)
    for item in my_iterator:
        # Next items
        push_item_to_db(item, connection)

5 Comments

The only drawback is that is could be labelled as "too clever" by some people (for me it would be fine).
@coredump: the drawback that I see (in some other solutions as well) is that the body of the loop needs to be duplicated.
Yes, I commented elsewhere about this but somehow did not notice that in your answer. Still, I prefer this one because it looks more straightforward.
@YvesDaoust: My solution 4 avoids duplicating the body of the loop, using 'itertools.chain()`.
@MikeMüller: we are working on a micro-micro-optimization (avoidance of an if test), and it is likely that any mechanism which is added (like try block or extra parameter in next call) adds more overhead than the if alone does. We cannot conclude without precise benchmarking, and most probably all this effort is totally worthless.
2

I am not an expert in Python but I would do something like this:

def put_items_to_database (iterator):
    try:
        item = next(iterator)

        # We connect to the database only after we 
        # know there at least one element in the collection            
        connection = get_db_connection()

        while True:
            push_item_to_db(item, connection)
            item = next(iterator)
    except StopIteration:
        pass

It is probably true that the performance is tied to the database here. However the question is about finding a way to avoid doing unnecessary work, and the above is a basic way of controlling precisely what happens during iteration.

Other solutions are "simpler", in some way, but on the other hand I think this one is more explicit and follows the principle of least astonishment.

Comments

1

Solution 1

This works without a while True loop.

try:
    next(my_iterator)
    connection = get_db_connection()
    push_item_to_db(item, connection)
except StopIteration:
    pass
for item in my_iterator:
    push_item_to_db(item, connection)

Solution 2

If you know that that iterator never returns None (or any other unique object), you could take advantage of the default of next():

if next(my_iterator, None) is not None:
    connection = get_db_connection()
    push_item_to_db(item, connection)
for item in my_iterator:
    push_item_to_db(item, connection)

Solution 3

If you cannot guaranty a value that never is returned by the iterator, you could use a sentinel.

sentinel = object()
if next(my_iterator, sentinel) is not sentinel:
    connection = get_db_connection()
    push_item_to_db(item, connection)
for item in my_iterator:
    push_item_to_db(item, connection)

Solution 4

Using itertools.chain():

from itertools import chain

for first_item in my_iterator:
    connection = get_db_connection()
    for item in chain([first_item], my_iterator):
        push_item_to_db(item, connection)

4 Comments

What's wrong with a "while True" loop? I am genuinely curious because you seem to prefer instead having two calls to "push_item_to_db" in all your solutions, which I don't find particularly nice.
Nothing really wrong. I just prefer a for loop over a while loop if possible. "Feels" a bit nicer or more pythonic (maybe ;)).
Why do not you put the for statement in an else branch in solutions 1-3? That would avoid useless execution of the for when the iterator does not return anything.
The else would never get executed if the iterator has any members. The else would only be reached if the iterator is empty. Besides a for over an empty iterator does zero iterations.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.