I have a simple code to group and calculate content of the list. I'm concerned of the time it takes to complete the job. With less than 100 items it is fast enough, but several hundreds items make my Mac to scream. And I have up to 10000 items on a list from real world data. So I need to know, how it is possible to optimize this code:
v = [1,2,3,4,5,6,7,8,9,10]
v1 = v[:5]
l = len(v1)
a = [sum(v1[x:y]) for y in range(l+1) for x in range(y)]
d = {x:a.count(x) for x in a}
So on v there is a list of integers. Digit can be from 1 to 4000. List length on example is 10, but as stated above, it will go go to 10000. v1 is just a split list to work with a smaller test data.
a gets every item as a single instance AND all preceding items as instances:
[1, 3, 2, 6, 5, 3, 10, 9, 7, 4, 15, 14, 12, 9, 5]
d groups all items and counts them as a key value pair dictionary:
{1: 1, 2: 1, 3: 2, 4: 1, 5: 2, 6: 1, 7: 1, 9: 2, 10: 1, 12: 1, 14: 1, 15: 1}
It seems that calculating a already becomes very heavy after 100+ items. With 500 items on a I get 276813 instances for d. So when there will be 10000 items on a list, I'm expecting up to 5559333 items to be on d and maybe 100 times more on a.
UPDATE
Based on comments and answers below there is some improvement done via implementation below:
def t5(v1):
l = len(v1)
d = {}
for x in range(0, l):
s = sum(v1[x:])
if s in d:
d[s] += 1
else:
d[s] = 1
for y in range(1, l-x):
s -= v1[-y]
if s in d:
d[s] += 1
else:
d[s] = 1
return d
I have no idea, how to use numpy and/or numba for more optimization. Maybe good for different separate question...
ayou can reuse two of the shorter sums. Then your summing will no longer be O(n**3).