For sure the fastest way would be to not use any for-loops in Python or generators from itertools, but to use vectorized math from Numpy, something like this:
n = len(delta_vec)
jj, kk = np.triu_indices(n, 1)
ww = 1. / np.exp((mjd_list[kk] - mjd_list[jj]) / avg_time)
w_sum = np.sum(ww)
PP = delta_vec[jj] * delta_vec[kk]
j_index = np.sum(ww * np.sign(PP) * np.sqrt(np.abs(PP)))
where all variables with double letters are vectors of length n. Note that you would have to convert your lists to numpy-arrays for this to work. You might even gain a bit more speed by replacing 1. / exp((a - b) / c) by exp((b - a) / c).
Explanation: you are doing calculation with all pairs of indices up to n, without taking pairs with the same indices (so no 1,1) and where the first in the pair is always the smallest number (so no 2,1). If you would make a 2-D matrix of all the possible pairs, you see that you are doing your computation only over elements above the diagonal, so you can use the function np.triu_index. To show that this works, your loop generates these indices:
In [18]: n = 4
...: for j in range(0, n - 1, 1):
...: for k in range(j + 1, n, 1):
...: print j, k
0 1
0 2
0 3
1 2
1 3
2 3
You can generate exactly the same indices with triu_indices:
In [19]: np.triu_indices(n, 1)
Out[19]: (array([0, 0, 0, 1, 1, 2]), array([1, 2, 3, 2, 3, 3]))
Quick test for correctness and speed:
import numpy as np
# generate some fake data
n = 300
mjd_vec = np.random.rand(n)
delta_vec = np.random.rand(n)
mjd_list = list(mjd_vec)
delta_list = list(delta_vec)
avg_time = 123
def f1():
w_sum = j_index = 0
n = len(delta_list)
for j in range(0, n - 1, 1):
for k in range(j + 1, n, 1):
w = 1. / np.exp((mjd_list[k] - mjd_list[j]) / avg_time)
w_sum = w_sum + w
P = delta_vec[j] * delta_vec[k]
j_index = j_index + w * np.sign(P) * np.sqrt(np.abs(P))
return w_sum, j_index
def f2():
n = len(delta_vec)
jj, kk = np.triu_indices(n, 1)
ww = 1. / np.exp((mjd_vec[kk] - mjd_vec[jj]) / avg_time)
w_sum = np.sum(ww)
PP = delta_vec[jj] * delta_vec[kk]
j_index = np.sum(ww * np.sign(PP) * np.sqrt(np.abs(PP)))
return w_sum, j_index
Result:
In [3]: f1()
Out[3]: (44839.864280781003, 18985.189058689775)
In [4]: f2()
Out[4]: (44839.864280781003, 18985.189058689775)
In [5]: timeit f1()
1 loops, best of 3: 629 ms per loop
In [6]: timeit f2()
100 loops, best of 3: 7.88 ms per loop
So the numpy version produces the same result and is a factor 80 faster!