6

I wrote a function in which I define variables and load objects. Here's a simplified version:

fn1 <- function(x) {
  load("data.RData") # a vector named "data"
  source("myFunctions.R")
  library(raster)
  library(rgdal)

  a <- 1
  b <- 2
  r1 <- raster(ncol = 10, nrow = 10)
  r1 <- init(r1, fun = runif)
  r2 <- r1 * 100
  names(r1) <- "raster1"
  names(r2) <- "raster2"
  m <- stack(r1, r2) # basically, a list of two rasters in which it is possible to access a raster by its name, like this: m[["raster1"]]

  c <- fn2(m)
}

Function "fn2" is can be found in "myFunctions.R" and is defined as:

fn2 <- function(x) {
  fn3 <- function(y) {
   x[[y]] * 100 * data
  }

  cl <- makeSOCKcluster(8)   
  clusterExport(cl, list("x"), envir = environment()) 
  clusterExport(cl, list("a", "b", "data")) 
  clusterEvalQ(cl, c(library(raster), library(rgdal), rasterOptions(maxmemory = a, chunksize = b))) 
  f <- parLapply(cl, names(x), fn3)  
  stopCluster(cl)
}

Now, when I run fn1, I get an error like this:

Error in get(name, envir = envir) : object 'a' not found

From what I understand from ?clusterExport, the default value for envir is .GlobalEnv, so I would assume that "a" and "b" would be accessible to fn2. However, it doesn't seem to be the case. How can I access the environment to which "a" and "b" belong?

So far, the only solution I have found is to pass "a" and "b" as arguments to fn2. Is there a way to use these two variables in fn2 without passing them as arguments?

Thanks a lot for your help.

1 Answer 1

7

You're getting the error when calling clusterExport(cl, list("a", "b", "data")) because clusterExport is trying to find the variables in .GlobalEnv, but fn1 isn't setting them in .GlobalEnv but in its own local environment.

An alternative is to pass the local environment of fn1 to fn2, and specify that environment to clusterExport. The call to fn2 would be:

c <- fn2(m, environment())

If the arguments to fn2 are function(x, env), then the call to clusterExport would be:

clusterExport(cl, list("a", "b", "data"), envir = env)

Since environments are passed by reference, there should be no performance problem doing this.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for making this clearer, @Steve Weston. I agree with you its a bit messy. I gave it more thinking and I can load the data within the function where it is used. As for the cluster calls, I will make the cluster object (cl) and call clusterEvalQ from my main function, fn1, and then I will pass cl as an argument to functions that need to do some clusterExport() before calling parallel functions like parLapply.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.