0

I've a doubt about lambda expression with internal list comprehension operation.

In the following code, the lambda will instantiate a list for each item every time?

def _find_items_not_present_in_store(self, store_today, store_yesterday):
        # finding what items are not in the store anymore
        items_not_in_store_anymore = filter(lambda item1: item1.item_id not in
                                         [item2.item_id for item2 in store_today.store_items],
                                         store_yesterday)
        return items_not_in_store_anymore

Would be better have this list

[item2.item_id for item2 in store.store_items]

instantiated outside of the lambda expression?

I couldn't find any documentation about it.

3 Answers 3

1

You are performing a linear search for each item in your list - and that is definitely sub-optimal. For a store with 1 million items in stock, hat can lead to the order of (1000000)² comparisons, which can be quite a burden even to fast computers. That is just to start

The thing to do is to create a set with the ID's of one of the collections, and use set's "contains" (the same in operator) - which searches in constant time.

def _find_items_not_present_in_store(self, store_today, store_yesterday):
    yesterday_ids = set(item.item_id for item in store_yesterday)
    return [item for item in store_today if item.item_id not in yesterday_ids]

And - in your code - in addition to searching in lists and not in a set, you are actually recreating the whole yesterday's ID list for each item in the today's list - as your list generator expression is inside the lambda function. In the approach above, I pre-calculate the ID set just once -aas that is what makes sense.

Besides that, as you can see, list comprehension and generator expressions in Python have an if clause that supersedes the usage of the filter function - filter only makes sense when one choose to use a functional notation instead of generators/comprehensions - and in most cases will have the overhead of one additional function call.

Sign up to request clarification or add additional context in comments.

2 Comments

Might be a nice improvement to have this function return a generator.
It can't be known without taking a look at how these results are used. Since the original code did return a list, one has to assume the code might want to iterate over it more than once. Otherwise, for most uses of this, I think a "set" would be more useful than a generator - but them, the "store items" have to be hashable.
1

Every call of the lambda function will re-create that list, so moving that construction outside the lambda will improve performance.

Moreover using a list to check for in isn't a good idea, since it takes linear time. Consider using a set instead:

def _find_items_not_present_in_store(self, store_today, store_yesterday):
        today_ids = {item2.item_if for item2 in store_today.store_items}

        items_not_in_store_anymore = filter(
            lambda item1: item1.item_id not in today_ids, 
            store_yesterday
        )
        return items_not_in_store_anymore

In old versions of python you need to do set( ... ) instead of the set-comprehension { ... }.

2 Comments

it is interesting to see how the proposed solutions converge that fast ☺. One question though on the difference spotted. How does the speed of the filter function compare to the list comprehension proposed in the other two solutions?
@Ev.Kounis It's probably going to be slightly slower. Because it has to perform a function call for every element, while the list-comprehension avoid that small bit of overhead... however in python3 filter is lazy, which means that if you need only the, say, first 10 elements of the result filter could be much faster than the list-comprehension, in that case you want to use a generator expression instead of a list-comprehension.
1

The way you've written it, the list is part of the lambda expression, so it will be evaluated each time the lambda is invoked.

Here's the most efficient way to implement your function:

def _find_items_not_present_in_store(self, store_today, store_yesterday):
    s = set(item2.item_id for item2 in store_today.store_items)
    items_not_in_store_anymore = [item1 for item1 in store_yesterday
                                  if item1.item_id not in s]
    return items_not_in_store_anymore

This does 2 main things to improve the efficiency:

  1. It creates a set, once, for fast membership checking
  2. It replaces the lambda/filter combination with a comprehension, which is more efficient.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.