Optimize python loop

Question

The following loop creates a giant bottleneck in my program. Particularly since records can be over 500k.

records = [item for sublist in records for item in sublist] #flatten the list
for rec in records:
    if len(rec) > 5:
        tag = '%s.%s' %(rec[4], rec[5].strip())
        if tag in mydict:
            mydict[tag][0] += 1
            mydict[tag][1].add(rec[6].strip())
        else:
            mydict[tag] = [1, set(rec[6].strip())]

I don't see a way that I could do this with a dictionary/list comprehension, and I'm not sure calling map would do me much good. Is there any way to optimize this loop?

Edit: The dictionary contains information about certain operations occurring in a program. rec[4] is the package which contains the operation and rec[5] is the name of the operation. The raw logs contains an int instead of the actual name, so when the log files are read into the list, the int is looked up and replaced with the operation name. The incremental counter counts how many times the operations was executed and the set contains the parameters for the operation. I am using a set because I don't want duplicates for the parameters. The strip is simply to remove white space. The existence of this white space is unpredictable in rec[6], but rether consistant in rec[4] and rec[5].

I don't see an easy way to optimize -- However, I feel obliged to point out that set(rec[6].strip()) is likely to create a set of single character strings -- which doesn't seem to jive well with set.add(rec[6].strip()) which adds a string to the set. — mgilson
– mgilson, Commented Sep 5, 2014 at 20:32
flattening the nested list doubles your processing time, doesn't it? — jcfollower
– jcfollower, Commented Sep 5, 2014 at 20:40
That portion seemed to run faster than the loop. It is a bit tricky to avoid that since it is the filtered contents of multiple files and currently the function that does the filtering processes one file at a time, but I suppose the function could be changed to handle multiple ones and avoid the flattening problem. — njvb
– njvb, Commented Sep 5, 2014 at 20:45
Why are you using a string as key? Could it be a tuple? Also what is the average length of rec[4], rec[5], rec[6] and what kind of characters are you stripping? What is the size of the dictionnary at the end of the loop? Why do you use a set and increment your count even if the item is already in the set? — Mmmh mmh
– Mmmh mmh, Commented Sep 6, 2014 at 9:03

Ashwini Chaudhary · Accepted Answer · 2014-09-05 20:44:18Z

6

Instead of flattening such a huge list you can directly iterate over its flattened iterator using itertools.chain.from_iterable.

from itertools import chain

for rec in chain.from_iterable(records):
    #rest of the code

This is around 3X times faster than the equivalent nested for-loop based genexp version as well:

In [13]: records = [[None]*500]*10000

In [14]: %%timeit
    ...: for rec in chain.from_iterable(records): pass
    ...: 
10 loops, best of 3: 54.7 ms per loop

In [15]: %%timeit
    ...: for rec in (item for sublist in records for item in sublist): pass
    ...: 
10 loops, best of 3: 170 ms per loop

In [16]: %%timeit #Your version
    ...: for rec in [item for sublist in records for item in sublist]: pass
    ...: 
1 loops, best of 3: 249 ms per loop

answered Sep 5, 2014 at 20:44

Ashwini Chaudhary

252k60 gold badges478 silver badges519 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jcfollower · Accepted Answer · 2014-09-05 20:54:40Z

3

I don't know if this will make it faster or not, but instead of having ...

if tag in mydict:
    mydict[tag][0] += 1
    mydict[tag][1].add(rec[6].strip())
else:
    mydict[tag] = [1, set(rec[6].strip())]

you can try ...

element = mydict.setdefault(tag, [0, set()])
element[0] += 1
element[1].add(rec[6], strip())

answered Sep 5, 2014 at 20:54

jcfollower

3,15822 silver badges26 bronze badges

Collectives™ on Stack Overflow

Optimize python loop

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related