1

I have a simple code to group and calculate content of the list. I'm concerned of the time it takes to complete the job. With less than 100 items it is fast enough, but several hundreds items make my Mac to scream. And I have up to 10000 items on a list from real world data. So I need to know, how it is possible to optimize this code:

v = [1,2,3,4,5,6,7,8,9,10]
v1 = v[:5]
l = len(v1)
a = [sum(v1[x:y]) for y in range(l+1) for x in range(y)]
d = {x:a.count(x) for x in a}

So on v there is a list of integers. Digit can be from 1 to 4000. List length on example is 10, but as stated above, it will go go to 10000. v1 is just a split list to work with a smaller test data.

a gets every item as a single instance AND all preceding items as instances:

[1, 3, 2, 6, 5, 3, 10, 9, 7, 4, 15, 14, 12, 9, 5]

d groups all items and counts them as a key value pair dictionary:

{1: 1, 2: 1, 3: 2, 4: 1, 5: 2, 6: 1, 7: 1, 9: 2, 10: 1, 12: 1, 14: 1, 15: 1}

It seems that calculating a already becomes very heavy after 100+ items. With 500 items on a I get 276813 instances for d. So when there will be 10000 items on a list, I'm expecting up to 5559333 items to be on d and maybe 100 times more on a.

UPDATE

Based on comments and answers below there is some improvement done via implementation below:

def t5(v1):
    l = len(v1)
    d = {}
    for x in range(0, l):
       s = sum(v1[x:])
       if s in d:
          d[s] += 1
       else:
          d[s] = 1
       for y in range(1, l-x):
           s -= v1[-y]
           if s in d:
               d[s] += 1
           else:
               d[s] = 1
    return d

I have no idea, how to use numpy and/or numba for more optimization. Maybe good for different separate question...

5
  • 1
    Seems like you need a dynamic programming approach. For each of the sums in a you can reuse two of the shorter sums. Then your summing will no longer be O(n**3). Commented Apr 4, 2016 at 6:57
  • You could throw numba at it and look if it's enough. But rewriting so that it uses a better algorithm (like schwobaseggl suggests) is the "right" approach. Commented Apr 4, 2016 at 7:00
  • @schwobaseggl would you mind to give any example how to implement "reusing sums" on the given code? Commented Apr 4, 2016 at 7:27
  • @syntonym I tried numba but so far it is just slowing down the execution. Commented Apr 4, 2016 at 7:27
  • @MarkokraM Numba will only be fast if it doesn't need to call into python code (see here). Unfortunately numba doesn't support dicts (and thus the code you posted will only compile in object-mode, giving bad results). If you rewrite without using dicts (e.g. like MartijnPieters answer does) you can use numba. Using MartjinPieters algorithm + numba gives me 104% more performance compared to your code posted (altering your code slightly to use array instead of dict gives me 92% more performance). Commented Apr 4, 2016 at 21:57

4 Answers 4

4

You are continually re-calculating sums for slices you already calculated before (each sum() call a redundant loop), plus you are creating loads of new list objects by slicing all the time.

You can use dynamic programming to calculate the sums instead, walking from the end of your input list; you only need to add the 'current' value to the sums for the next value in your list:

# v1 is the input list, l is len(v1)
sums = [0] * (l * (l + 1) // 2)
for x in range(l - 1, -1, -1):
    for y in range(x, l):
         i = (y * (y + 1) // 2) + x
         sums[i] += v1[x] + (sums[i + 1] if x < y else 0)

This adds up already calculated sums for partial subsequences to the next longer subsequence, avoiding a whole inner loop. This takes O(N^2) time, rather than your O(N^3) approach (where sum() loops over each subslice up to N elements long).

It translates x and y coordinates into your final 'triangle' to avoid having to create a larger matrix (or a dictionary, which would have additional memory and hashing overhead).

That's because you calculate slices between any two points in the list; if you take a matrix where the axis are the start and end indices included in each slice, you can number the final list like this:

   0  1  2  3  4

0  0  1  3  6 10 

1     2  4  7 11

2        5  8 12

3           9 13

4             14

(blank spaces are reverse slices, duplicates, not used in the output). For each row and column, you can calculate the output index by counting how many values fit in the triangle before that column plus the row number, or ((col * col + 1) // 2) + row).

The dynamic programming approach adds the sum for already produced slices for later values (the row below the indices) to the v1 value for the current row:

v1     0  1  2  3  4

1 > 0  1  3  6 10 15
          ^  ^  ^  ^  
2 > 1     2  5  9 14
             ^  ^  ^
3 > 2        3  7 12
                ^  ^
4 > 3           4  9
                   ^ 
5 > 4              5

So the value at row 2, column 4 is v1[2] plus the already calculated sum for row 3, column 4, or in this case, 2 + 12. There is only ever an already calculated sum for those elements where the row index is smaller than the column index; at position (3, 3) there is no calculated sum at (4, 3).

Next problem is that your list.count() traverses your whole list of sums to count how often one number appears. Thus, your {x:a.count(x) for x in a} expression creates a O(N^2) double loop; it'll take quadratic time to process all counts.

Use a Counter() object to produce that dictionary instead (a Counter is a subclass of dict):

from collections import Counter

d = Counter(sums)

A Counter() goes through all your elements just once to count the values; it simply keeps counters for each value it has seen so far as it iterates.

Sign up to request clarification or add additional context in comments.

Comments

2

Something like

[sum(v1[x:y]) for y in range(l+1) for x in range(y)]

seems problematic to me for a couple of reasons:

  1. it considers each substring of v1, of which there are 1+2+3+...+n for a list of length n, i.e. the number of substrings grows proportional to the square of the length.
  2. it creates a slice for each substring, and each slice is a copy.
  3. it sums each slice even though there is clearly a lot of overlap between the slices (e.g. if you know the sum of the first four numbers, you should really only need a single extra addition to get the sum of the first five numbers instead of summing the first five from scratch).

I suspect there's not much you can do about 1, but you can be more clever about tackling the other two items: it may be faster to utilise the fact that the sum of the slice [i:j] is the just the sum of the slice [i:j-1] (i.e. the slice starting at the same position, but one element shorter) and the element at position j-1 (i.e. the interval [j-1:j]). This means each sum can be defined as a sum of two previously computed sums.

I.e. instead of

a = [sum(v1[x:y]) for y in range(l+1) for x in range(y)]

try using

sums = {(i,i+1): x for i, x in enumerate(v1)}
for intervalLen in range(2, l + 1):
    for i in range(l - intervalLen + 1):
        j = i + intervalLen
        sums[(i,j)] = sums[(i,j-1)] + sums[(j-1,j)]
a = sums.values()

This uses a little sums dictionary of memoisation: it maps tuples denoting the intervals (e.g. (0,1)) to the sum of that interval. The dictionary is initialised with the trivial sums for intervals of length 1 and then updated for longer intervals. Each step makes use of previously computed sums.

4 Comments

Yup, this is the approach I used too, but I avoided a dictionary (which can take up more memory than a matrix for this, plus hashing costs for all those tuples can make it slower). I then avoided a full N * N matrix too, by mapping into the triangle (roughly half the matrix) actually used by this calculation.
@MartijnPieters Right, a dictionary of tuples is possibly not the most efficient data structure, but it's quite convenient (and readable). For what it's worth, one further optimization in terms of memory usage would be to yield all sums as early as possible: notice how computing the sum of intervals of length n only needs the sums of length n-1 and of length 1. The sums for 2..n-2 aren't actually needed anymore, and don't need to be stored - you only need the previous length and length 1. I didn't want to obscore the algorithm too much though.
You need all the sums you calculate; note how your final sums dict is exactly 15 elements long (15 == 5 * 6 // 2, a triangle covering half the 5 * 5 matrix of point-to-point slices, the other half being the reverse slices).
@MartijnPieters Yes, you need to yield all the sums - you don't need all of them in the dictionary till the very end though.
0

You might reduce the cost of of the summing with an approach similar to this:

cache = defaultdict(defaultdict(dict))

def my_sum(v, x, y):
    if y - x == 1:
        return v[x]
    if not y in cache[v][x]:
        mid = (x + y) / 2
        cache[v][x][y] = my_sum(v, x, mid) + my_sum(v, mid, y)
    return cache[v][x][y]

then:

a = [my_sum(v1, x, y) for y in range(l+1) for x in range(y)]

might be faster...

Comments

-1

You can also try this:

In [75]: v = [1,2,3,4,5,6,7,8,9,10, 9, 2, 1]

In [76]: r = {}

In [77]: for x in v: r[x] = r.setdefault(x,0) + 1

In [78]: r
Out[78]: {1: 2, 2: 2, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1, 9: 2, 10: 1}

In [79]: def func():
   ....:     r = {}
   ....:     for x in v: r[x] = r.setdefault(x,0) + 1
   ....:     return r
   ....: 
In [80]: def func1():
   ....:     return Counter(v)
   ....: 
In [81]: %timeit -q -o func()
Out [81]: <TimeitResult : 100000 loops, best of 3: 2.25 µs per loop>

In [82]: %timeit -q -o func1()
Out [82]: <TimeitResult : 100000 loops, best of 3: 4.17 µs per loop>

so using it, i am getting the final results as:

# t is len of v1
r = {} 
for i, x in enumerate(v1):
    for y in range(i, t):
        r[(i, y)] = r.setdefault((i, y), 0) + x
        for z in range(0, i):
            r[(z, y)] = r.setdefault((z, y), 0) + x
r1 = {}
for x in r.values():
    r1[x] = r1.setdefault(x, 0) + 1
print(r1)

Result:

{1: 1, 2: 1, 3: 2, 4: 1, 5: 2, 6: 1, 7: 1, 9: 2, 10: 1, 12: 1, 14: 1, 15: 1}

4 Comments

This doesn't actually produce the right values. You are counting the keys of r (not the values) but the values of r are not the correct sub-slice sums anyway.
So why are you using a third loop inside the x, y loop here, your solution is now O(N^3), no better than the OP's approach, but with sum() implemented in C code, your solution is going to be slower than theirs.
r[(i, y)] = r.setdefault((i, y), 0) + x is rather redundant. You are setting the value for i, y to 0 to only immediately overwrite it with 0 + x. There is no need to set a default in this case, use r.get((i, y), 0) instead.
@MartijnPieters: t is the length of v1. BTW your answers are awesome, always learns something from your answers :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.