0

I am new to julia and to get started I wanted to port some numpy code to julia and hoped to get some nice performance increase. So far not to my satisfaction.

This is the function I want to compute

function s(x_list, r_list)
    result_list = zeros(size(x_list,1))
    for i = 1:size(x_list,1)
        dotprods = r_list * x_list[i,:]'
        expcall  = exp(im * dotprods)
        sumprod  = sum(expcall) * sum(conj(expcall))
        result_list[i] = sumprod
    end
    return result_list
end

with data input that looks like

v = rand(3)
r = rand(6000,3)
x = linspace(1.0, 2.0, 300) * (v./sqrt(sumabs2(v)))'

for this function and the given input, @time s(x,r) gives me

0.110619 seconds (3.60 k allocations: 96.256 MB, 8.47% gc time)

For this case, numpy does the same job in ~70ms, so I'm not very happy! Now if I do a @parallel for loop with julia -p 2:

function s(x_list, r_list)
    result_list = SharedArray(Float64, size(x_list,1))
    @parallel for i = 1:size(x_list,1)
        dotprods = r_list * x_list[i,:]'
        expcall  = exp(im * dotprods)
        sumprod  = sum(expcall) * sum(conj(expcall))
        result_list[i] = sumprod
    end
    return result_list
end

the problem is that

result_list[i] = sumprod

doesn't get updated and I get the list of zeros returned from the array initialization. What am I doing wrong here? Further attempts to increase speed also did not show any benefit, e.g.

@vectorize_2arg Array{Float64,2} s

and declaring types

function s{T<:Float64}(x_list::Array{T,2}, r_list::Array{T,2}) 

But now, starting the same @parallel for loop in a session with just one thread (no -p2, just julia) the array does get updated and @time s(x,r) tells me

0.000040 seconds (36 allocations: 4.047 KB)

which is actually impossible for the function and input given! Is this a bug?

Any help is very appreciated!

2 Answers 2

3

Julia's @parallel macro does a distributed for loop: it copies all the data to other processes and does computations on each of them, reducing over the results and returning that result. The processes do not share memory – and may even be on other machines altogether. Your original data is never touched because each worker is modifying its own copy of that data. You may be thinking of threads, which is a currently-experimental feature that Julia will be adding in the future.

Sign up to request clarification or add additional context in comments.

1 Comment

Oh sorry forgot to mention that I also used a SharedArray for the @parallel for loop. This should actually take care of the copying. Will modify my question to make this obvious!
2

One problem is that you're not waiting for the @parallel call to complete. From the docs:

...the reduction operator can be omitted if it is not needed. In that case, the loop executes asynchronously, i.e. it spawns independent tasks on all available workers and returns an array of Future immediately without waiting for completion. The caller can wait for the Future completions at a later point by calling fetch() on them, or wait for completion at the end of the loop by prefixing it with @sync, like @sync @parallel for.

Try prefixing for loop with @sync

1 Comment

thanks, that @sync actually did it! As for the other part, are there other things one can do to speed up julia to achieve sort of similar or better performance then python/numpy in single process mode?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.