I am working on a code for a model based on some Fourier transforms. Currently I am trying to optimize a part of it, so that it would be usable with some big amount of data. While trying to do that, I found a strange behavior, mainly a loop version of my code turn out to be faster than the same code written with numpy. Testing code is as follows:
# -*- coding: utf-8 -*-
import numpy as np
import timeit
def fourier_coef_loop(ts, times, k_max):
coefs = np.zeros(k_max, dtype=float)
t = 2.0 * np.pi * (times - times[0]) / times[-1]
x = np.dot(np.arange(1,k_max+1)[np.newaxis].T,t[np.newaxis])
for k in xrange(1, k_max + 1):
cos_k = np.cos(x[k-1])
coefs[k - 1] = (ts[-1] - ts[0]) + (ts[:-1] * (cos_k[:-1] - cos_k[1:])).sum()
return(coefs)
def fourier_coef_np(ts, times, k_max):
coefs = np.zeros(k_max, dtype=float)
t = 2.0 * np.pi * (times - times[0]) / times[-1]
x = np.dot(np.arange(1,k_max+1)[np.newaxis].T,t[np.newaxis])
coefs = np.add(np.einsum('ij,j->i',np.diff(np.cos(x)), -ts[:-1]), (ts[-1] - ts[0]))
return(coefs)
if __name__ == '__main__':
iterations = 10
size = 20000
setup = "from __main__ import fourier_coef_loop, fourier_coef_np, size\n" \
"import numpy as np"
# arg = np.random.normal(size=size)
# print(np.all(fourier_coef_np(arg, np.arange(size,dtype=float), size / 2) == fourier_coef_loop(arg, np.arange(size,dtype=float), size / 2)))
time_loop = timeit.timeit("fourier_coef_loop(np.random.normal(size=size), np.arange(size,dtype=float), size / 2)",
setup=setup, number=iterations)
print("With loop: {} s".format(time_loop))
time_np = timeit.timeit("fourier_coef_np(np.random.normal(size=size), np.arange(size,dtype=float), size / 2)",
setup=setup, number=iterations)
print("With numpy: {} s".format(time_np))
It gives following results:
With loop: 60.8385488987 s
With numpy: 64.9192998409 s
Can someone please tell me why is the loop version faster than the purely numpy version? I have totally ran out of ideas. I would also appreciate any suggestions on how to make this particular function faster.
einsum('ij,j->i,...)sounds a lot like a matrix-vector product (->np.dotagain), and whateverx = np.dot(np.arange(1,k_max+1)[np.newaxis].T,t[np.newaxis])should do, I'm sure there's a cleaner way to do it.sizeI get a memory error. For size=2000, timings are also similar. For 200 the np version has a substantial edge. So my guess is that for larger sizes memory management issues are chewing into thenumpytimes.