2

Coming from a R background, I was exploring the parallel possibilities by Julia. My objective is to replicate the performance of mcapply (parallel apply)

** The problem: **

I iterate a function on the rows of a data-frame that looks like that:

for i in 1:_nrow # of my DataFrame
lat1 = Raw_Data[i,"lat1"]
lat2 = Raw_Data[i,"lat2"]
lon1 = Raw_Data[i,"long1"]
lon2 = Raw_Data[i,"long2"]
iata1 = Raw_Data[i,"iata1"]
iata2 = Raw_Data[i,"iata2"]

a[i] = [(iata1::String,iata2::String, trunc(i,2), get_intermediary_points(lat1,lon1,lat2,lon2,j) )  for j in 0:.1:1]
end

Now, as a step toward parallelization, I can also create an anonymous function that does quite similar work, running calculation on each chunk of my dataframe:

Raw_Data["selector"] = rand(1:nproc,_nrow) # Define how I split my dataframe. 1 chunck per proc
B = by(Raw_Data,:selector,intermediary_points)

Is there a way to speed up calculations with a parallelized "by"? Otherwise, please suggest good alternative.

Thanks!

Note: This is how my dataframe Raw_Data looks like

6x7 DataFrame:
          iata1     lat1     long1 iata2     lat2       long2
[1,]    1 "ELH" 0.444616   -1.3384 "FLL" 0.455079    -1.39891
[2,]    2 "BCN" 0.720765 0.0362729 "UFA" 0.955274    0.976218
[3,]    3 "ACE" 0.505053 -0.237426 "VCE" 0.794214    0.215582
[4,]    4 "PVG" 0.543669   2.12552 "LZH" 0.425277     1.91171
[5,]    5 "CDG" 0.855379 0.0444809 "VLC" 0.689233 -0.00835298
[6,]    6 "HLD" 0.858699   2.08915 "CGQ" 0.765906     2.18718
1
  • Have you considered writing a modified get_intermediary_points, say get_intermediary_points_pmap and then using a = pmap(get_intermediary_points_pmap, eachrow(Raw_Data)? Commented Dec 11, 2014 at 22:52

1 Answer 1

0

I figure out what happened. I didn't made all the inputs available to all processors.

Basically, if you are running into the same problem:

  1. All functions should have @everywhere in front of them

  2. All packages should also be declared as @everywhere using DataFrames

  3. All parameters should also be declared with @everywhere in front of it

Now, that's a lot of work. You can follow http://julia.readthedocs.org/en/latest/manual/parallel-computing/ to use stand-alone packages that would simplify a bit the process.

Cheers.

Sign up to request clarification or add additional context in comments.

1 Comment

If you have a file with a bunch of functions that should be known by all processes, you can simply use @everywhere include("/path/to/myfun.jl").This way you don't have to modify the declarations of the functions inside the file.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.