Python Nested For Loop Array Comparison - Possibility of Optimization?

Hi, thanks for replying! Next time if I have code optimizations I will post to CodeReview. The problem is, each val inside set_of_pk_values are unique if the consider all of the keys. But there will be repeating values of the same key across all dictionaries. The first part is to compare if there are any keys are the same. This line: if val in set_of_pk_values[idx+1:]: seems to compare the dictionaries, and not the item inside the key['predecessors'].

@user1157751: I offered multiple optimizations because I wasn't sure what was important details and what wasn't. If the if in doesn't work, use one of the other two optimizations.

@ArtOfWarfare: There is one thing preventing me from upvoting this post, it's the way you use the whatIEnumerate. This won't work like this because it's an iterator (in python3, or an enumerate object in python2) and as such is not subscriptable. Your idea not to enumerate inside the first for loop is however valid.

@Cilyan: Fixed by replacing the slice notation with islice().

@ArtOfWarfare still not: bpaste.net/show/1dd12178faea . As an iterator, it is consumed.

|

Marcus Müller · Accepted Answer · 2015-03-22 13:42:26Z

4

+50

Optimization of part 1

Original

Man, this is bad:

for idx, val in enumerate(set_of_pk_values):
    for idx_2, val_2 in enumerate(set_of_pk_values):
        if (val['someKey'] == val_2['someKey'] and idx != idx_2):
            do_stuff()

Step 1

Just skip the indices of the elements you've already tried (== is commutative):

for idx, val in enumerate(set_of_pk_values[:-1]):
    for val_2 in set_of_pk_values[idx+1:]
        if (val['someKey'] == val_2['someKey']):
            do_stuff()

Step 0.1

Rename that. It's ugly.

for idx, first_dic in enumerate(set_of_pk_values[:-1]):
    for second_dic in set_of_pk_values[idx+1:]
        if (first_dic['someKey'] == second_dic['someKey']):
            do_stuff()

Step 2

Now, the if in every loop iteration is bothersome. Replace it by filtering the reduced list:

hits = []
for idx, first_dic in enumerate(set_of_pk_values[:-1]):
    hits += (first_dic['someKey'], filter(lambda dic: dic['someKey'] == first_dic['someKey'], set_of_pk_values[idx:1]) )

hits now contains a list of match tuples: hits[i] = ( mathing first element , list of matches that have idx > first element).

Step 3

Dictionary lookups are expensive. Replace them using operator.itemgetter:

from operator import itemgetter
getter = itemgetter("someKey")
hits = []
for idx, first_dic in enumerate(set_of_pk_values[:-1]):
    hits += (getter(first_dic), filter(lambda dic: getter(dic) == getter(first_dic), set_of_pk_values[idx:1]) )

Step 4

Sit back and look. The iterations of the for loop don't really rely on the state of last iteration. Time for list comprehensions.

from operator import itemgetter
getter = itemgetter("someKey")
hits = [ ( getter(first_dic), filter(lambda dic: getter(dic) == getter(first_dic), set_of_pk_values[idx:-1]) ) for idx, first_dic in enumerate(set_of_pk_values[:-1])]

edited Mar 22, 2015 at 13:42

answered Mar 16, 2015 at 17:34

Marcus Müller

36.9k4 gold badges59 silver badges105 bronze badges

14 Comments

Marcus Müller Over a year ago

For step 3, where does second_dic come from?

@user1157751: typo, fixed.

Marcus Müller Over a year ago

Thanks for your reply! Is it possible for hits to give me the indexes that matches in set_of_pk_values? Also for step 4, it seems that an open bracket isn't matching with an ending bracket? ==> (getter(first_dic).

@user1157751: thanks for spotting that typo. Also, yes of course; just exchange the first element of the tuple, i.e. replace ( getter(first_dic) ,... by ( idx, ....

Thanks again. This might be impossible, but is it possible to get the index of set_of_pk_values[idx:1] used to compare with first_dic in step 3?

|

siebz0r · Accepted Answer · 2015-03-17 14:02:50Z

3

Iterations in Python are slower than iterations in C. It's better to do the iterations in C by using the Python libraries. Funny that nobody mentioned itertools here...

itertools.combinations makes unique combinations in C and then returns a generator for the combinations:

import itertools
import operator
getter = operator.itemgetter('someKey_1')

for a, b in itertools.combinations(set_of_pk_values, 2):
    if getter(a) == getter(b):
        # logic?

answered Mar 17, 2015 at 14:02

siebz0r

20.6k16 gold badges70 silver badges111 bronze badges

4 Comments

Have you tried timeit on your answer and the other answers? Just curious how much better it performs.

@ArtOfWarfare: Here an attempt to time the solutions. I must say, I'm very surprised with the results... Any comment/critics welcome on my code. gist.github.com/Cilyan/50b9ee3e2dad67bb8a6b

@Cilyan: Wait, so my solution is the fastest of them all, then?