The following loop creates a giant bottleneck in my program. Particularly since records can be over 500k.
records = [item for sublist in records for item in sublist] #flatten the list
for rec in records:
if len(rec) > 5:
tag = '%s.%s' %(rec[4], rec[5].strip())
if tag in mydict:
mydict[tag][0] += 1
mydict[tag][1].add(rec[6].strip())
else:
mydict[tag] = [1, set(rec[6].strip())]
I don't see a way that I could do this with a dictionary/list comprehension, and I'm not sure calling map would do me much good. Is there any way to optimize this loop?
Edit: The dictionary contains information about certain operations occurring in a program. rec[4] is the package which contains the operation and rec[5] is the name of the operation. The raw logs contains an int instead of the actual name, so when the log files are read into the list, the int is looked up and replaced with the operation name. The incremental counter counts how many times the operations was executed and the set contains the parameters for the operation. I am using a set because I don't want duplicates for the parameters. The strip is simply to remove white space. The existence of this white space is unpredictable in rec[6], but rether consistant in rec[4] and rec[5].
set(rec[6].strip())is likely to create a set of single character strings -- which doesn't seem to jive well withset.add(rec[6].strip())which adds a string to the set.recordsa genexp would help.