1

I'm trying to optimize a nested for loops that compares an element in the array with the rest of the elements in the array.

There's two part, the first part is for example, an Array has 3 elements, and each element is a dictionary:

[{"someKey_1":"a"}, {"someKey_1":"b"}, {"somekey_1":"a"}]


1st iteration(1st element compares with 2nd element):

Test key of "someKey" for two elements, since a != b, then we do nothing


2st iteration(1st element compares with 3nd element):

Test key of "someKey" for two elements, since a == a, we do some logic


The code:

for idx, val in enumerate(set_of_pk_values):
    for idx_2, val_2 in enumerate(set_of_pk_values):
        if (val['someKey'] == val_2['someKey'] and idx != idx_2):
                #Some Logic

The second part is very similar to the previous example (3 items in the list), in the same dictionary, we have an array associated with a key (now there's a single dictionary with two keys in each element of the array), let's say:

[{"someKey_1":[b,f]}{"someKey_2":a}, 
{"someKey_1":[e,f]}{"someKey_2":b}, 
{"somekey_1":[h,k]}{"someKey_2":c}]

1st iteration (1st element compares with 2nd element):

loops through the array with the key: someKey_1

b==b (2nd element's someKey_2), then do some logic

f!=b (2nd element's someKey_2), no logic is done


2nd iteration (1st element compares with 3rd element):

loops through the array with the key: someKey_1

b==c (3rd element's someKey_2), then do some logic

f!=c (3rd element's someKey_2), no logic is done


The code:

for idx, val in enumerate(set_of_pk_values):
    for idx_2, val_2 in enumerate(set_of_pk_values):
        for pred in val['someKey_1']:
            if(val_2['someKey_2'] == pred):
                #Some Logic

Currently the runtime for the first nested loop: 21 seconds, and the second nested loop is around 19 seconds. Compared to other processes, ranging from 1-2 seconds, this part is clearly a bottleneck.

Can anybody point me to the right direction on how to optimize this piece of simple, yet extremely time consuming code?

14
  • 2
    if you're comparing 1 versus 2, do you really need to compare 2 versus 1? Commented Mar 14, 2015 at 3:05
  • No for the first nested loop, but it is required for second nested loop. I was thinking is there were any way to make this O(n) Commented Mar 14, 2015 at 3:33
  • Could you combine the nested loops? Instead of comprising a skiplist and then reiterating to parse, could you not simply skip the items in one loop using if statements, and possibly continue? Commented Mar 14, 2015 at 4:54
  • dictionaries don't have indices (0..n) (you can't reliable use enumerate as the order might differ). Can you tell us what you are trying to achieve with the code? Commented Mar 14, 2015 at 6:20
  • Set_of_pk_values is an array that contains dictionaries. There's keys in the dictionary whose content I'm trying to compare. Commented Mar 14, 2015 at 9:39

3 Answers 3

5

First off, I believe this should be posted on CodeReview, not StackOverflow.

StackOverflow is for getting help with code that doesn't work.
CodeReview is for getting help with code that does work, but you want to make it better.

Second, here's some suggestions on optimizing it:

  • Don't enumerate() inside the loop.
  • Use slices in the first scenario to avoid pointless comparisons.

Here's how I would rewrite your first scenario:

# Check if two dictionary with the key 'predecessor' are the same,
# and they are NOT the same index (otherwise it would be comparing themselves)
# set_of_pk_values is an array of dictionaries.
for idx, val in enumerate(set_of_pk_values):
    for val_2 in set_of_pk_values[idx+1:]:  # Note the slice and lack of enumerate
        if (val['predecessor'] == val_2['predecessor']):  # Don't waste time checking indexes
            # Do Something Here, also we don't want to compare itself, because it will be true

Instead of for\if, use if in:

for idx, val in enumerate(set_of_pk_values):
    if val in set_of_pk_values[idx+1:]:
        # Do Something Here, also we don't want to compare itself, because it will be true.

If you really want to enumerate, since you just want the same enumeration multiple times, I would just do it once outside the loop and store it in a variable, then loop over it. Here's what I mean:

I was incorrect, the below doesn't work, see the comments.

# Doesn't work, see comments.
# from itertools import islice
# whatIEnumerate = enumerate(set_of_pk_values)
# for idx, val in whatIEnumerate:
    # for idx_2, val_2 in islice(whatIEnumerate, idx+1, None):
        # ...
Sign up to request clarification or add additional context in comments.

6 Comments

Hi, thanks for replying! Next time if I have code optimizations I will post to CodeReview. The problem is, each val inside set_of_pk_values are unique if the consider all of the keys. But there will be repeating values of the same key across all dictionaries. The first part is to compare if there are any keys are the same. This line: if val in set_of_pk_values[idx+1:]: seems to compare the dictionaries, and not the item inside the key['predecessors'].
@user1157751: I offered multiple optimizations because I wasn't sure what was important details and what wasn't. If the if in doesn't work, use one of the other two optimizations.
@ArtOfWarfare: There is one thing preventing me from upvoting this post, it's the way you use the whatIEnumerate. This won't work like this because it's an iterator (in python3, or an enumerate object in python2) and as such is not subscriptable. Your idea not to enumerate inside the first for loop is however valid.
@Cilyan: Fixed by replacing the slice notation with islice().
@ArtOfWarfare still not: bpaste.net/show/1dd12178faea . As an iterator, it is consumed.
|
4
+50

Optimization of part 1

Original

Man, this is bad:

for idx, val in enumerate(set_of_pk_values):
    for idx_2, val_2 in enumerate(set_of_pk_values):
        if (val['someKey'] == val_2['someKey'] and idx != idx_2):
            do_stuff()

Step 1

Just skip the indices of the elements you've already tried (== is commutative):

for idx, val in enumerate(set_of_pk_values[:-1]):
    for val_2 in set_of_pk_values[idx+1:]
        if (val['someKey'] == val_2['someKey']):
            do_stuff()

Step 0.1

Rename that. It's ugly.

for idx, first_dic in enumerate(set_of_pk_values[:-1]):
    for second_dic in set_of_pk_values[idx+1:]
        if (first_dic['someKey'] == second_dic['someKey']):
            do_stuff()

Step 2

Now, the if in every loop iteration is bothersome. Replace it by filtering the reduced list:

hits = []
for idx, first_dic in enumerate(set_of_pk_values[:-1]):
    hits += (first_dic['someKey'], filter(lambda dic: dic['someKey'] == first_dic['someKey'], set_of_pk_values[idx:1]) ) 

hits now contains a list of match tuples: hits[i] = ( mathing first element , list of matches that have idx > first element).

Step 3

Dictionary lookups are expensive. Replace them using operator.itemgetter:

from operator import itemgetter
getter = itemgetter("someKey")
hits = []
for idx, first_dic in enumerate(set_of_pk_values[:-1]):
    hits += (getter(first_dic), filter(lambda dic: getter(dic) == getter(first_dic), set_of_pk_values[idx:1]) )

Step 4

Sit back and look. The iterations of the for loop don't really rely on the state of last iteration. Time for list comprehensions.

from operator import itemgetter
getter = itemgetter("someKey")
hits = [ ( getter(first_dic), filter(lambda dic: getter(dic) == getter(first_dic), set_of_pk_values[idx:-1]) ) for idx, first_dic in enumerate(set_of_pk_values[:-1])]

14 Comments

For step 3, where does second_dic come from?
@user1157751: typo, fixed.
Thanks for your reply! Is it possible for hits to give me the indexes that matches in set_of_pk_values? Also for step 4, it seems that an open bracket isn't matching with an ending bracket? ==> (getter(first_dic).
@user1157751: thanks for spotting that typo. Also, yes of course; just exchange the first element of the tuple, i.e. replace ( getter(first_dic) ,... by ( idx, ....
Thanks again. This might be impossible, but is it possible to get the index of set_of_pk_values[idx:1] used to compare with first_dic in step 3?
|
3

Iterations in Python are slower than iterations in C. It's better to do the iterations in C by using the Python libraries. Funny that nobody mentioned itertools here...

itertools.combinations makes unique combinations in C and then returns a generator for the combinations:

import itertools
import operator
getter = operator.itemgetter('someKey_1')

for a, b in itertools.combinations(set_of_pk_values, 2):
    if getter(a) == getter(b):
        # logic?

4 Comments

Have you tried timeit on your answer and the other answers? Just curious how much better it performs.
@ArtOfWarfare: Here an attempt to time the solutions. I must say, I'm very surprised with the results... Any comment/critics welcome on my code. gist.github.com/Cilyan/50b9ee3e2dad67bb8a6b
@Cilyan: Wait, so my solution is the fastest of them all, then?
@ArtOfWarfare Err... unless the one from Marcus Müller starts working and proves to be faster, yes! I'm rather suprised that a double for works better as combinations which is internally accelerated, but that's what the results show... :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.