0

I have situation where I need to loop over two lists of objects and find equal and then loop over their fields and change some attributes. Looks it like this

for new_product in products_and_articles['products']:
  for old_product in products_for_update:
    if new_product.article == old_product.article:
      for old_field in old_product._meta.get_all_field_names():
        for new_field in new_product._meta.get_all_field_names():
          if old_field == new_field and old_field != 'id' and old_field != 'slug':
            setattr(old_product, old_field, getattr(new_product, old_field))

Obviously this is far from being good or even acceptable. So I'm seeking for advice how can I avoid so much looping and enhance the algorythm

7
  • 1
    Remove the new_field loop? you don't use new_field anyway. Commented Dec 24, 2014 at 8:26
  • Also, sort the two lists and you get nlogn instead of n^2 Commented Dec 24, 2014 at 8:28
  • Could you put a quick example of the two input lists, and a quick example of output? Commented Dec 24, 2014 at 8:36
  • What I'm doing here is searching for equal products in both lists and then searching for equal model fields in matched objects after that I update fields of old object with new data Commented Dec 24, 2014 at 8:38
  • 1
    This looks like django, if so - let the database do the heavy lifting Commented Dec 24, 2014 at 8:40

4 Answers 4

5

It helps if you break the process into logical, reusable parts.

for new_product in products_and_articles['products']:
  for old_product in products_for_update:
    if new_product.article == old_product.article:
      …

For example, here what you are doing is find a product that matches a particular article. Since article is unique, we could write something like this:

def find_products_by_article(products, article):
  '''Find all products that match the given article.  Returns
  either a product or 'None' if it doesn't exist.'''
  for products in products:
    return product

Then call it with:

for old_product in products_for_update:
  new_products = find_products_by_article(
                   products_and_articles['products'],
                   old_product.article)
  …

But this could be much more efficient if we could take advantage of a data structure that is optimized for look-ups, namely a dict (constant instead of linear complexity). So what we could do instead is:

# build a dictionary that stores products indexed by article
products_by_article = dict(product.article, product for product in
                           products_and_articles['products'])

for old_product in products_for_update:
  try:
    # look up product in the dictionary
    new_product = products_by_article[old_product.article]
  except KeyError:
    # silently ignore products that don't exist
    continue
  …

If you do such lookups frequently, it would be better to reuse the products_by_article dictionary elsewhere as well instead of building one from scratch every time. Be careful though: if you use multiple representations of the product records, you need make they always stay in sync!

For the inner loops, notice that new_field here only serves as a check for whether a field exists:

…
  for old_field in old_product._meta.get_all_field_names():
    for new_field in new_product._meta.get_all_field_names():
      if old_field == new_field and old_field != 'id' and old_field != 'slug':
        setattr(old_product, old_field, getattr(new_product, old_field))

(Note that this is slightly suspicious: any new fields that don't already exist in the old_product are silently discarded: is this intentional?)

This can be repackaged as follows:

def transfer_fields(old, new, exclusions=('id', 'slug')):
  '''Update all pre-existing fields in the old record to have
  the same values as the new record.  The 'exclusions' parameter
  can be used to exclude certain fields from being updated.'''
  # use a set here for efficiency reasons
  fields = frozenset(old._meta.get_all_field_names())
  fields.difference_update(new._meta.get_all_field_names())
  fields.difference_update(exclusions)
  for field in fields:
    setattr(old, field, getattr(new, field))

Putting all this together:

# dictionary of products indexed by article
products_by_article = dict(product.article, product for product in
                           products_and_articles['products'])

for old_product in products_for_update:
  try:
    new_product = products_by_article[old_product.article]
  except KeyError:
    continue          # ignore non-existent products
  transfer_fields(old_product, new_product)

This final code has a time complexity of O(n × k), where n is the number of products and k is the number of fields.

Sign up to request clarification or add additional context in comments.

4 Comments

Note that also you're organizing it nicely, it's still O(n^2*k).
Yes, this can be made more efficient by redesigning the data structure, but I left it as an exercise for the OP. Plus, I don't know if article is a unique key — it makes some difference in how it can be done.
can you please write your suggestions on redesign, because I hope that one day management will say something like 'lets make this stuff working good':)
Done. (You don't have to redesign per se, you could just use the new data structure locally for this section of the code only.)
1

You can use set to find the intersection instead loop over both lists and check for equality :

set(products_and_articles['products']).intersection(set(products_for_update))

example :

>>> l=[1,2,3]
>>> a=[2,3,4]
>>> set(l).intersection(set(a))
set([2, 3])

4 Comments

im my opinion set is not a good approach here, because each list may have duplicate items, and the set will remove them.
@Urb yep , but as it doesn't change the main list and the op just want the intersection i thinks its fine !
please check my comment above
I'm sorry, but it's more complicated than that. The objects you handle are not numbers - they're objects, so you either have to form the set out of a.article or implement a hash method in the objects. Even if you do, to transfer the attributes from the old to the new you need to have refs to both the objects; you will need to search in the loops, no way around it.
1

We start with four loops and an efficiency of O(n^2*k^2), n being the number of items and k being the number of attributes. Let's see what we can do.

First of all, get rid of the new_product loop, you don't need it:

for old_field in old_product._meta.get_all_field_names():
    for new_field in new_product._meta.get_all_field_names():
        if old_field == new_field and old_field != 'id' and old_field != 'slug':
            setattr(old_product, old_field, getattr(new_product, old_field))

To:

for old_field in old_product._meta.get_all_field_names():
    if old_field != 'id' and old_field != 'slug':
        setattr(old_product, old_field, getattr(new_product, old_field))

Got it to O(n^2*k). Now for the product finding part.

First, get the two lists sorted, then proceed like you do when you merge lists in merge sort:

a = sorted(products_and_articles['products'], key=lambda x: x.article)
b = sorted(products_for_update, key=lambda x: x.article)
i = j = 0
while(i < len(a) and j < len(b)):
    if (a[i].article < b[j].article):
        a += 1
        continue
    if (a[i].article > b[j].article):
        b += 1
        continue
    ...logic...
    a += 1  # Maybe you want to get rid of this one, I'm not sure..
    b += 1

Depending on the size of your database, it might be more or less adequate, because it requires you to make new sorted lists. Not very heavy in memory (it's only refs anyway), but if you have really long lists and limited space the huge efficiency win may not compensate.

Got it down to O(n*logn*k), it's the best I could do. You can, probably, get it even lower using dictionaries, but it requires you to change your db, so it requires more time and effort.

Comments

0

the first two for's can be changed to:

from itertools import product


for new_product, old_product in product(list1, list2)
    # logic and other loops

and you can do the same for the two inner loops:

 for old_field in old_product._meta.get_all_field_names():
    for new_field in new_product._meta.get_all_field_names():
for old_field, new_field in product(list1, list2)

1 Comment

This will only double the time.. I don't think there is a lot of difference between combinations and for loops, and you will have the pairs where they both belong to the same list as well..

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.