2

I am having a performance issue on Python. The snippet below has 4 nested loops iterating over an OrderedDict, matrix_col which has 11000 items in it. Another iteration goes over a defaultdict, trans which has also ~11000 items in it. Execution of this process is taking too long. I appreciate if anyone can advise how to improve the performance.

import string
from collections import namedtuple
from collections import defaultdict
from collections import OrderedDict
import time

trans = defaultdict(dict)
...
matrix_col = OrderedDict(sorted(matrix_col.items(), key=lambda t: t[0]))
trans_mat = []
counter = 0

for u1, v1 in matrix_col.items():
    print counter, time.ctime()
    for u2, v2 in matrix_col.items():
        flag = True
        for w1 in trans.keys():
            for w2, c in trans[u1].items():
                if u1 == str(w1) and u2 == str(w2):
                    trans_mat.append([c])    
                    flag = False
        if flag:
            trans_mat.append([0])

trans_mat = np.asarray(trans_mat)
trans_mat = np.reshape(trans_mat, (11000, 11000))

Here is its current performance. It is basically processing 2 items per minute. With this speed it will take over 5 days to form the matrix, trans_mat:

0 Tue Oct  6 11:31:18 2015
1 Tue Oct  6 11:31:46 2015
2 Tue Oct  6 11:32:19 2015
3 Tue Oct  6 11:32:52 2015
4 Tue Oct  6 11:33:19 2015
5 Tue Oct  6 11:33:46 2015
4
  • 2
    So why are you looping over all keys in trans and all items in trans[u1], when you could just test for u1 and u2 being keys in the trans and trans[u1] dictionaries? Commented Oct 6, 2015 at 16:01
  • 1
    Can you provide a sample of the data, and what's going on in that ellipsis? Commented Oct 6, 2015 at 16:11
  • Hi Martijn, how do I get the value "c" in the last step if I don't iterate over trans[u1].items()? Commented Oct 6, 2015 at 17:50
  • If you're dealing with lots of numeric data to the point where performance is a concern, use either numpy and/or pandas (pandas is probably better for you if you're using OrderedDict to hold a bunch of rows with various fields in some order) Commented Oct 6, 2015 at 17:59

2 Answers 2

2

You aren't taking advantage of the fast lookup available to dictionaries. Finding a key in a dict is O(1). To fix this you just need to change your algorithm so you aren't iterating all the keys looking for the ones you want..

from itertools import product
trans_mat = [ [trans[u1][u2]] if (u1 in trans) and (u2 in trans[u1]) else [0]
                 for u1 in matrix_col for u2 in matrix_col ]
Sign up to request clarification or add additional context in comments.

9 Comments

If he needs to get empty values then yes, that's a good solution.
Hi Chad, thanks for the response. matrix_col is a sorted set of items of type "State". Here is an example of "State": 33.594994,-84.72793,9978.2,280,60,270 1**
Hi Chad, thanks for the response. The reason why I am doing string conversion is because I need to form a sorted and consistent set of row and column labels for matrix operations. So, I am converting from "the other type" to String first, so that it is sortable. Then I am sorting based on keys and creating an OrderedDict. I am basically unable to sort if I don't turn it into String.
Use the same type for keys and for sorting-- make the keys strings too, if you think you must. And then use key lookups. You're shooting yourself in the foot.
Why aren't State items sortable?
|
1

Without context it's hard to understand logic and what you are trying to achieve but you should considering changing algorithm to iterate over trans first and then check the trans_mat. Something like:

for w1, t_val in trans.items():
    w1_is_in_matrix_col = str(w1) in matrix_col
    for w2, c in t_val.items():
        if w1_is_in_matrix_col and str(w2) in matrix_col:
            trans_mat.append([c])
        else:
            trans_mat.append([0])

Theoretically you can use list comprehension here and that would give you some boost as well (but minor comparing to current inefficiency).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.