Simplest way to do parallel replicate

Question

I am fond of the parallel package in R and how easy and intuitive it is to do parallel versions of apply, sapply, etc.

Is there a similar parallel function for replicate?

Greg Snow · Accepted Answer · 2019-02-13 16:34:35Z

36

You can just use the parallel versions of lapply or sapply, instead of saying to replicate this expression n times you do the apply on 1:n and instead of giving an expression, you wrap that expression in a function that ignores the argument sent to it.

possibly something like:

#create cluster
library(parallel)
cl <- makeCluster(detectCores()-1)  
# get library support needed to run the code
clusterEvalQ(cl,library(MASS))
# put objects in place that might be needed for the code
myData <- data.frame(x=1:10, y=rnorm(10))
clusterExport(cl,c("myData"))
# Set a different seed on each member of the cluster (just in case)
clusterSetRNGStream(cl)
#... then parallel replicate...
parSapply(cl, 1:10000, function(i,...) { x <- rnorm(10); mean(x)/sd(x) } )
#stop the cluster
stopCluster(cl)

as the parallel equivalent of:

replicate(10000, {x <- rnorm(10); mean(x)/sd(x) } )

edited Feb 13, 2019 at 16:34

answered Oct 9, 2013 at 20:09

Greg Snow

49.9k6 gold badges86 silver badges114 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

bdeonovic Over a year ago

Thanks, this is what I ended up doing when I read that replicate was just a wrapper for sapply.

Greg Snow Over a year ago

@D1X, Did you try the code? When I ran it as it was, I saw 10,000 unique values, no duplicates. But I tested on windows, which only spawns, not forks. If you use forking instead of spawning to create the cluster and a seed is set in the parent, then you might get duplicates. I added a call above to explicitly set the random seeds on each child process, just in case (and made some other minor changes to make the code runnable as is).

D1X Over a year ago

Yes, this is fixed with the seed setting (And yes I was running this in Linux). Thank you.

HenrikB · Accepted Answer · 2021-04-04 18:36:02Z

10

The future.apply package provides a plug-in replacement to replicate() that runs in parallel and uses statistical sound parallel random number generation out of the box:

library(future.apply)
plan(multisession, workers = 4)

y <- future_replicate(100, mean(rexp(10)))

answered Apr 4, 2021 at 18:36

HenrikB

6,93035 silver badges41 bronze badges

Comments

Steve Weston · Accepted Answer · 2013-10-10 02:35:39Z

3

Using clusterEvalQ as a model, I think I would implement a parallel replicate as:

parReplicate <- function(cl, n, expr, simplify=TRUE, USE.NAMES=TRUE)
  parSapply(cl, integer(n), function(i, ex) eval(ex, envir=.GlobalEnv),
            substitute(expr), simplify=simplify, USE.NAMES=USE.NAMES)

The arguments simplify and USE.NAMES are compatible with sapply rather than replicate, but they make it a better wrapper around parSapply in my opinion.

Here's an example derived from the replicate man page:

library(parallel)
cl <- makePSOCKcluster(3)
hist(parReplicate(cl, 100, mean(rexp(10))))

edited Oct 10, 2013 at 2:35

answered Oct 10, 2013 at 2:09

Steve Weston

19.7k4 gold badges62 silver badges78 bronze badges

Collectives™ on Stack Overflow

Simplest way to do parallel replicate

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related