Nested python loop revisited

Question

I know there are already several posts about how to speed up a nested python loop, but I wasn't able to apply those to my problem. I have a nested python for loop, the first one iterating over all elements of a list, the second one iterating over all remaining list items, yielding a combination of all list items with each other. For one I think this is rather ugly, but also it is really slow. Any ideas how to make this faster and more pythonic?

n = len(delta_vec)
for j in range(0, n - 1, 1):
    for k in range(j + 1, n, 1):
        w = 1. / np.exp((mjd_list[k] - mjd_list[j]) / avg_time)
        w_sum = w_sum + w
        P = delta_vec[j] * delta_vec[k]
        j_index = j_index + w * np.sign(P) * np.sqrt(np.abs(P))

Could you add a quick explanation of what other posts you reviewed and why those solutions didn't work? This will help prevent you getting answers which aren't going to solve your problem — wnnmaw
– wnnmaw, Commented Aug 5, 2014 at 13:42
Not a complete answer but some tips: iterating with enumerate() instead of range() is more pythonic and could improve performance a bit as enumerate yields it's result. Also if performance is critical you could rewrite your function as a list in comprehension: pythonforbeginners.com/lists/list-comprehensions-in-python — Asics
– Asics, Commented Aug 5, 2014 at 13:47

Bas Swinckels · Accepted Answer · 2014-08-06 07:35:50Z

For sure the fastest way would be to not use any for-loops in Python or generators from itertools, but to use vectorized math from Numpy, something like this:

n = len(delta_vec)
jj, kk = np.triu_indices(n, 1)
ww = 1. / np.exp((mjd_list[kk] - mjd_list[jj]) / avg_time)
w_sum = np.sum(ww)
PP = delta_vec[jj] * delta_vec[kk]
j_index = np.sum(ww * np.sign(PP) * np.sqrt(np.abs(PP)))

where all variables with double letters are vectors of length n. Note that you would have to convert your lists to numpy-arrays for this to work. You might even gain a bit more speed by replacing 1. / exp((a - b) / c) by exp((b - a) / c).

Explanation: you are doing calculation with all pairs of indices up to n, without taking pairs with the same indices (so no 1,1) and where the first in the pair is always the smallest number (so no 2,1). If you would make a 2-D matrix of all the possible pairs, you see that you are doing your computation only over elements above the diagonal, so you can use the function np.triu_index. To show that this works, your loop generates these indices:

In [18]: n = 4
    ...: for j in range(0, n - 1, 1):
    ...:     for k in range(j + 1, n, 1):
    ...:         print j, k
0 1
0 2
0 3
1 2
1 3
2 3

You can generate exactly the same indices with triu_indices:

In [19]: np.triu_indices(n, 1)
Out[19]: (array([0, 0, 0, 1, 1, 2]), array([1, 2, 3, 2, 3, 3]))

Quick test for correctness and speed:

import numpy as np
# generate some fake data
n = 300
mjd_vec = np.random.rand(n)
delta_vec = np.random.rand(n)
mjd_list = list(mjd_vec)
delta_list = list(delta_vec)
avg_time = 123

def f1():
    w_sum = j_index = 0
    n = len(delta_list)
    for j in range(0, n - 1, 1):
        for k in range(j + 1, n, 1):
            w = 1. / np.exp((mjd_list[k] - mjd_list[j]) / avg_time)
            w_sum = w_sum + w
            P = delta_vec[j] * delta_vec[k]
            j_index = j_index + w * np.sign(P) * np.sqrt(np.abs(P))
    return w_sum, j_index

def f2():
    n = len(delta_vec)
    jj, kk = np.triu_indices(n, 1)
    ww = 1. / np.exp((mjd_vec[kk] - mjd_vec[jj]) / avg_time)
    w_sum = np.sum(ww)
    PP = delta_vec[jj] * delta_vec[kk]
    j_index = np.sum(ww * np.sign(PP) * np.sqrt(np.abs(PP)))
    return w_sum, j_index

Result:

In [3]: f1()
Out[3]: (44839.864280781003, 18985.189058689775)

In [4]: f2()
Out[4]: (44839.864280781003, 18985.189058689775)

In [5]: timeit f1()
1 loops, best of 3: 629 ms per loop

In [6]: timeit f2()
100 loops, best of 3: 7.88 ms per loop

So the numpy version produces the same result and is a factor 80 faster!

jonrsharpe · Accepted Answer · 2014-08-05 14:02:08Z

2

It looks like you're iterating over two lists simultaneously, taking the items from the same indices in each list, and dealing with all pairs of indices. In that case, something like:

>>> from itertools import combinations, izip
>>> for x, y in combinations(izip([1, 2, 3], [4, 5, 6]), 2):
    print zip(x, y)


[(1, 2), (4, 5)]
[(1, 3), (4, 6)]
[(2, 3), (5, 6)]

Each output list here contains a two-tuple of the pair from the first list, then a two-tuple of the pair from the second list. Using itertools allows this without having to build a whole list (except the inner zip).

Using your own variable names, if you iterated over:

for d_v, m_l in combinations(izip(delta_vec, mjd_list)):

then zip(d_v, m_l) would give you the equivalent of:

[(delta_vec[j], delta_vec[k]), (mjd_list[j], mjd_list[k])]

in your current approach

Alternatively, as you're using numpy, there is no doubt some blazingly-fast array-based approach I don't know about.

edited Aug 5, 2014 at 14:02

answered Aug 5, 2014 at 13:56

jonrsharpe

123k31 gold badges277 silver badges488 bronze badges

3 Comments

tobias_k Over a year ago

+1 Just wanted to suggest combinations as well, but using it with izip is even better. However, you should explain how exactly this is applied to the calculation, as it requires quite some adjustments in the code.

tobias_k Over a year ago

To be honest, the more I look at it the more I doubt that izip and then zipping it back really makes it simpler than just iterating the combinations of the indices... Guess you could do something like for (dj, dk), (mj, mk) in combinations(...) though.

jonrsharpe Over a year ago

@tobias_k the izip is to get the pairs from the two lists, the zip is to put the combinations back into the sensible order (i.e. not [(dj, mj), (dk, mk)]).

wnnmaw · Accepted Answer · 2014-08-05 14:53:11Z

0

I would make two changes:

Change the iteration
Move the computation code to a function

This will help clean up your code and make it a little more "pythonic." As for the, the bottleneck is just hat you have a lot of calculations to do. Python isn't designed to be fast, so if the speed is really an issue, consider outsourcing the calculation to another code which is faster. Typically the compiled codes (the C family, FORTRAN, etc) are pretty good for stuff like this.

def computations(x, y, i, j):
    w = 1. / np.exp((mjd_list[i] - mjd_list[j]) / avg_time)
    w_sum = w_sum + w
    P = x * y
    j_index = j_index + w * np.sign(P) * np.sqrt(np.abs(P))

    return [stuff you want to keep]

for i, x in enumerate(delta_vec):
    for j, y in delta_vec[i:]:
         result = computations(x, y, i, j)

edited Aug 5, 2014 at 14:53

answered Aug 5, 2014 at 13:49

wnnmaw

5,5443 gold badges41 silver badges63 bronze badges

3 Comments

tobias_k Over a year ago

This won't work. 1) x is the index and i is the element; 2) you also need the index for y; 3) the method still expects indices as input, but you are passing it the elements (well, one index and one element), and 4) I doubt that this is faster, slicing a new sublist in each iteration...

wnnmaw Over a year ago

@tobias_k, I wasn't really going for faster, just prettier, I'll update to make the changes, thanks for pointing them out

Bas Swinckels Over a year ago

Some issues: 1) The code in the question iterates over two lists in parallel (mjd_list and delta_vec1). In this case, I find it more pythonic/clear to do explicit indexing with j and k than your 'asymmetric solution' which uses enumerate on one list, and indexing on the other. Using zip might look better. 2) The outer loop should skip the last element: for i, x in enumerate(delta_vec[:-1]). 3) No need to go to Fortran to speed things up considerably, see my answer. 4) As Tobias said, you are making a copy of a slice of delta_vec on every iteration, this is slow.

Collectives™ on Stack Overflow

Nested python loop revisited

3 Answers 3

Comments

3 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

3 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related