30

I am fond of the parallel package in R and how easy and intuitive it is to do parallel versions of apply, sapply, etc.

Is there a similar parallel function for replicate?

3 Answers 3

36

You can just use the parallel versions of lapply or sapply, instead of saying to replicate this expression n times you do the apply on 1:n and instead of giving an expression, you wrap that expression in a function that ignores the argument sent to it.

possibly something like:

#create cluster
library(parallel)
cl <- makeCluster(detectCores()-1)  
# get library support needed to run the code
clusterEvalQ(cl,library(MASS))
# put objects in place that might be needed for the code
myData <- data.frame(x=1:10, y=rnorm(10))
clusterExport(cl,c("myData"))
# Set a different seed on each member of the cluster (just in case)
clusterSetRNGStream(cl)
#... then parallel replicate...
parSapply(cl, 1:10000, function(i,...) { x <- rnorm(10); mean(x)/sd(x) } )
#stop the cluster
stopCluster(cl)

as the parallel equivalent of:

replicate(10000, {x <- rnorm(10); mean(x)/sd(x) } )
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks, this is what I ended up doing when I read that replicate was just a wrapper for sapply.
@D1X, Did you try the code? When I ran it as it was, I saw 10,000 unique values, no duplicates. But I tested on windows, which only spawns, not forks. If you use forking instead of spawning to create the cluster and a seed is set in the parent, then you might get duplicates. I added a call above to explicitly set the random seeds on each child process, just in case (and made some other minor changes to make the code runnable as is).
Yes, this is fixed with the seed setting (And yes I was running this in Linux). Thank you.
10

The future.apply package provides a plug-in replacement to replicate() that runs in parallel and uses statistical sound parallel random number generation out of the box:

library(future.apply)
plan(multisession, workers = 4)

y <- future_replicate(100, mean(rexp(10)))

Comments

3

Using clusterEvalQ as a model, I think I would implement a parallel replicate as:

parReplicate <- function(cl, n, expr, simplify=TRUE, USE.NAMES=TRUE)
  parSapply(cl, integer(n), function(i, ex) eval(ex, envir=.GlobalEnv),
            substitute(expr), simplify=simplify, USE.NAMES=USE.NAMES)

The arguments simplify and USE.NAMES are compatible with sapply rather than replicate, but they make it a better wrapper around parSapply in my opinion.

Here's an example derived from the replicate man page:

library(parallel)
cl <- makePSOCKcluster(3)
hist(parReplicate(cl, 100, mean(rexp(10))))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.