I am new to julia and to get started I wanted to port some numpy code to julia and hoped to get some nice performance increase. So far not to my satisfaction.
This is the function I want to compute
function s(x_list, r_list)
result_list = zeros(size(x_list,1))
for i = 1:size(x_list,1)
dotprods = r_list * x_list[i,:]'
expcall = exp(im * dotprods)
sumprod = sum(expcall) * sum(conj(expcall))
result_list[i] = sumprod
end
return result_list
end
with data input that looks like
v = rand(3)
r = rand(6000,3)
x = linspace(1.0, 2.0, 300) * (v./sqrt(sumabs2(v)))'
for this function and the given input, @time s(x,r) gives me
0.110619 seconds (3.60 k allocations: 96.256 MB, 8.47% gc time)
For this case, numpy does the same job in ~70ms, so I'm not very happy! Now if I do a @parallel for loop with julia -p 2:
function s(x_list, r_list)
result_list = SharedArray(Float64, size(x_list,1))
@parallel for i = 1:size(x_list,1)
dotprods = r_list * x_list[i,:]'
expcall = exp(im * dotprods)
sumprod = sum(expcall) * sum(conj(expcall))
result_list[i] = sumprod
end
return result_list
end
the problem is that
result_list[i] = sumprod
doesn't get updated and I get the list of zeros returned from the array initialization. What am I doing wrong here? Further attempts to increase speed also did not show any benefit, e.g.
@vectorize_2arg Array{Float64,2} s
and declaring types
function s{T<:Float64}(x_list::Array{T,2}, r_list::Array{T,2})
But now, starting the same @parallel for loop in a session with just one thread (no -p2, just julia) the array does get updated and @time s(x,r) tells me
0.000040 seconds (36 allocations: 4.047 KB)
which is actually impossible for the function and input given! Is this a bug?
Any help is very appreciated!