Parallelize row-operation in Julia

Question

Coming from a R background, I was exploring the parallel possibilities by Julia. My objective is to replicate the performance of mcapply (parallel apply)

** The problem: **

I iterate a function on the rows of a data-frame that looks like that:

for i in 1:_nrow # of my DataFrame
lat1 = Raw_Data[i,"lat1"]
lat2 = Raw_Data[i,"lat2"]
lon1 = Raw_Data[i,"long1"]
lon2 = Raw_Data[i,"long2"]
iata1 = Raw_Data[i,"iata1"]
iata2 = Raw_Data[i,"iata2"]

a[i] = [(iata1::String,iata2::String, trunc(i,2), get_intermediary_points(lat1,lon1,lat2,lon2,j) )  for j in 0:.1:1]
end

Now, as a step toward parallelization, I can also create an anonymous function that does quite similar work, running calculation on each chunk of my dataframe:

Raw_Data["selector"] = rand(1:nproc,_nrow) # Define how I split my dataframe. 1 chunck per proc
B = by(Raw_Data,:selector,intermediary_points)

Is there a way to speed up calculations with a parallelized "by"? Otherwise, please suggest good alternative.

Thanks!

Note: This is how my dataframe Raw_Data looks like

6x7 DataFrame:
          iata1     lat1     long1 iata2     lat2       long2
[1,]    1 "ELH" 0.444616   -1.3384 "FLL" 0.455079    -1.39891
[2,]    2 "BCN" 0.720765 0.0362729 "UFA" 0.955274    0.976218
[3,]    3 "ACE" 0.505053 -0.237426 "VCE" 0.794214    0.215582
[4,]    4 "PVG" 0.543669   2.12552 "LZH" 0.425277     1.91171
[5,]    5 "CDG" 0.855379 0.0444809 "VLC" 0.689233 -0.00835298
[6,]    6 "HLD" 0.858699   2.08915 "CGQ" 0.765906     2.18718

Have you considered writing a modified get_intermediary_points, say get_intermediary_points_pmap and then using a = pmap(get_intermediary_points_pmap, eachrow(Raw_Data)? — rickhg12hs
– rickhg12hs, Commented Dec 11, 2014 at 22:52

Guillaume · Accepted Answer · 2015-01-04 16:49:45Z

0

I figure out what happened. I didn't made all the inputs available to all processors.

Basically, if you are running into the same problem:

All functions should have @everywhere in front of them
All packages should also be declared as @everywhere using DataFrames
All parameters should also be declared with @everywhere in front of it

Now, that's a lot of work. You can follow http://julia.readthedocs.org/en/latest/manual/parallel-computing/ to use stand-alone packages that would simplify a bit the process.

Cheers.

answered Jan 4, 2015 at 16:49

Guillaume

1,2862 gold badges13 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

esel Over a year ago

If you have a file with a bunch of functions that should be known by all processes, you can simply use @everywhere include("/path/to/myfun.jl").This way you don't have to modify the declarations of the functions inside the file.

Collectives™ on Stack Overflow

Parallelize row-operation in Julia

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related