More effective loop in Python

Question

I have situation where I need to loop over two lists of objects and find equal and then loop over their fields and change some attributes. Looks it like this

for new_product in products_and_articles['products']:
  for old_product in products_for_update:
    if new_product.article == old_product.article:
      for old_field in old_product._meta.get_all_field_names():
        for new_field in new_product._meta.get_all_field_names():
          if old_field == new_field and old_field != 'id' and old_field != 'slug':
            setattr(old_product, old_field, getattr(new_product, old_field))

Obviously this is far from being good or even acceptable. So I'm seeking for advice how can I avoid so much looping and enhance the algorythm

Could you put a quick example of the two input lists, and a quick example of output? — Jivan
– Jivan, Commented Dec 24, 2014 at 8:36
What I'm doing here is searching for equal products in both lists and then searching for equal model fields in matched objects after that I update fields of old object with new data — micgeronimo
– micgeronimo, Commented Dec 24, 2014 at 8:38
This looks like django, if so - let the database do the heavy lifting — Burhan Khalid
– Burhan Khalid, Commented Dec 24, 2014 at 8:40

Rufflewind · Accepted Answer · 2014-12-24 09:59:21Z

5

It helps if you break the process into logical, reusable parts.

for new_product in products_and_articles['products']:
  for old_product in products_for_update:
    if new_product.article == old_product.article:
      …

For example, here what you are doing is find a product that matches a particular article. Since article is unique, we could write something like this:

def find_products_by_article(products, article):
  '''Find all products that match the given article.  Returns
  either a product or 'None' if it doesn't exist.'''
  for products in products:
    return product

Then call it with:

for old_product in products_for_update:
  new_products = find_products_by_article(
                   products_and_articles['products'],
                   old_product.article)
  …

But this could be much more efficient if we could take advantage of a data structure that is optimized for look-ups, namely a dict (constant instead of linear complexity). So what we could do instead is:

# build a dictionary that stores products indexed by article
products_by_article = dict(product.article, product for product in
                           products_and_articles['products'])

for old_product in products_for_update:
  try:
    # look up product in the dictionary
    new_product = products_by_article[old_product.article]
  except KeyError:
    # silently ignore products that don't exist
    continue
  …

If you do such lookups frequently, it would be better to reuse the products_by_article dictionary elsewhere as well instead of building one from scratch every time. Be careful though: if you use multiple representations of the product records, you need make they always stay in sync!

For the inner loops, notice that new_field here only serves as a check for whether a field exists:

…
  for old_field in old_product._meta.get_all_field_names():
    for new_field in new_product._meta.get_all_field_names():
      if old_field == new_field and old_field != 'id' and old_field != 'slug':
        setattr(old_product, old_field, getattr(new_product, old_field))

(Note that this is slightly suspicious: any new fields that don't already exist in the old_product are silently discarded: is this intentional?)

This can be repackaged as follows:

def transfer_fields(old, new, exclusions=('id', 'slug')):
  '''Update all pre-existing fields in the old record to have
  the same values as the new record.  The 'exclusions' parameter
  can be used to exclude certain fields from being updated.'''
  # use a set here for efficiency reasons
  fields = frozenset(old._meta.get_all_field_names())
  fields.difference_update(new._meta.get_all_field_names())
  fields.difference_update(exclusions)
  for field in fields:
    setattr(old, field, getattr(new, field))

Putting all this together:

# dictionary of products indexed by article
products_by_article = dict(product.article, product for product in
                           products_and_articles['products'])

for old_product in products_for_update:
  try:
    new_product = products_by_article[old_product.article]
  except KeyError:
    continue          # ignore non-existent products
  transfer_fields(old_product, new_product)

This final code has a time complexity of O(n × k), where n is the number of products and k is the number of fields.

edited Dec 24, 2014 at 9:59

answered Dec 24, 2014 at 8:48

Rufflewind

9,0362 gold badges41 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

unddoch Over a year ago

Note that also you're organizing it nicely, it's still O(n^2*k).

Rufflewind Over a year ago

Yes, this can be made more efficient by redesigning the data structure, but I left it as an exercise for the OP. Plus, I don't know if article is a unique key — it makes some difference in how it can be done.

micgeronimo Over a year ago

can you please write your suggestions on redesign, because I hope that one day management will say something like 'lets make this stuff working good':)

Rufflewind Over a year ago

Done. (You don't have to redesign per se, you could just use the new data structure locally for this section of the code only.)

Kasravnd · Accepted Answer · 2014-12-24 08:26:40Z

1

You can use set to find the intersection instead loop over both lists and check for equality :

set(products_and_articles['products']).intersection(set(products_for_update))

example :

>>> l=[1,2,3]
>>> a=[2,3,4]
>>> set(l).intersection(set(a))
set([2, 3])

answered Dec 24, 2014 at 8:26

Kasravnd

108k19 gold badges167 silver badges195 bronze badges

4 Comments

Urban48 Over a year ago

im my opinion set is not a good approach here, because each list may have duplicate items, and the set will remove them.

Kasravnd Over a year ago

@Urb yep , but as it doesn't change the main list and the op just want the intersection i thinks its fine !

micgeronimo Over a year ago

please check my comment above

unddoch Over a year ago

I'm sorry, but it's more complicated than that. The objects you handle are not numbers - they're objects, so you either have to form the set out of a.article or implement a hash method in the objects. Even if you do, to transfer the attributes from the old to the new you need to have refs to both the objects; you will need to search in the loops, no way around it.

unddoch · Accepted Answer · 2014-12-24 09:04:39Z

We start with four loops and an efficiency of O(n^2*k^2), n being the number of items and k being the number of attributes. Let's see what we can do.

First of all, get rid of the new_product loop, you don't need it:

for old_field in old_product._meta.get_all_field_names():
    for new_field in new_product._meta.get_all_field_names():
        if old_field == new_field and old_field != 'id' and old_field != 'slug':
            setattr(old_product, old_field, getattr(new_product, old_field))

To:

for old_field in old_product._meta.get_all_field_names():
    if old_field != 'id' and old_field != 'slug':
        setattr(old_product, old_field, getattr(new_product, old_field))

Got it to O(n^2*k). Now for the product finding part.

First, get the two lists sorted, then proceed like you do when you merge lists in merge sort:

a = sorted(products_and_articles['products'], key=lambda x: x.article)
b = sorted(products_for_update, key=lambda x: x.article)
i = j = 0
while(i < len(a) and j < len(b)):
    if (a[i].article < b[j].article):
        a += 1
        continue
    if (a[i].article > b[j].article):
        b += 1
        continue
    ...logic...
    a += 1  # Maybe you want to get rid of this one, I'm not sure..
    b += 1

Depending on the size of your database, it might be more or less adequate, because it requires you to make new sorted lists. Not very heavy in memory (it's only refs anyway), but if you have really long lists and limited space the huge efficiency win may not compensate.

Got it down to O(n*logn*k), it's the best I could do. You can, probably, get it even lower using dictionaries, but it requires you to change your db, so it requires more time and effort.

Urban48 · Accepted Answer · 2014-12-24 08:39:27Z

0

the first two for's can be changed to:

from itertools import product


for new_product, old_product in product(list1, list2)
    # logic and other loops

and you can do the same for the two inner loops:

 for old_field in old_product._meta.get_all_field_names():
    for new_field in new_product._meta.get_all_field_names():

for old_field, new_field in product(list1, list2)

edited Dec 24, 2014 at 8:39

answered Dec 24, 2014 at 8:30

Urban48

1,4861 gold badge15 silver badges28 bronze badges

1 Comment

unddoch Over a year ago

This will only double the time.. I don't think there is a lot of difference between combinations and for loops, and you will have the pairs where they both belong to the same list as well..

Collectives™ on Stack Overflow

More effective loop in Python

4 Answers 4

4 Comments

4 Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

4 Comments

4 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related