0

I have a function sum_var that take an integer as input, and returns a real number as output. I checked this function on some inputs and it runs well.

I would like to use clusterApply to utilize my CPU (6 cores and 12 logical processors). I've tried to modify the code given in the class

library("parallel")
cl <- makeCluster(6)
res_par <- clusterApply(cl, 1:10000, fun = sum_var)

But it returns an error Error in checkForRemoteErrors(val) : 10000 nodes produced errors; first error: object 'df_simulate' not found.

Could you please elaborate on how to achieve my goal? Below is the full code.

### Generate dataframe
n_simu <- 1000
set.seed(1)
df_simulate <- data.frame(x_1 = rnorm(n_simu))
for (k in 2:10000) {
set.seed(k)
df_simulate[, paste0("x_", k)] <- rnorm(n_simu)
}
df_simulate[, "y"] <- runif(n_simu, 0, 0.5)
df_simulate[df_simulate$x_40 > 0 & df_simulate$x_99 > 0.8, "y"] <-
df_simulate[df_simulate$x_40 > 0 & df_simulate$x_99 > 0.8, "y"] + 5.75
df_simulate[df_simulate$x_40 > 0 & df_simulate$x_99 <= 0.8 & df_simulate$x_30 > 0.5, "y"] <-
df_simulate[df_simulate$x_40 > 0 & df_simulate$x_99 <= 0.8 & df_simulate$x_30 > 0.5, "y"] + 18.95
df_simulate[df_simulate$x_40 > 0 & df_simulate$x_99 <= 0.8 & df_simulate$x_30 <= 0.5, "y"] <-
df_simulate[df_simulate$x_40 > 0 & df_simulate$x_99 <= 0.8 & df_simulate$x_30 <= 0.5, "y"] + 20.55
df_simulate[df_simulate$x_40 <= 0 & df_simulate$x_150 < 0.5, "y"] <-
df_simulate[df_simulate$x_40 <= 0 & df_simulate$x_150 < 0.5, "y"] - 5
df_simulate[df_simulate$x_40 <= 0 & df_simulate$x_150 >= 0.5, "y"] <-
df_simulate[df_simulate$x_40 <= 0 & df_simulate$x_150 >= 0.5, "y"] - 10

### Function to calculate the sum of variances
n_min <- 5
index <- n_min:(1000 - n_min)

sum_var <- function(m){
  df1 <- df_simulate[, m]
  df2 <- as.data.frame(sort(df1))
  for (i in index){
    df3 <- df2[1:i, 1]
    df4 <- df2[(i+1):1000, 1]
    df2[i, 2] <- sd(df3) + sd(df4)
  }
  position <- which.min(df2[, 2]) 
  return(df2[position, 1])
}

### Parallel Computing    
library("parallel")
cl <- makeCluster(6)
res_par <- clusterApply(cl, 1:10000, fun = sum_var)

1 Answer 1

3

When you use makeCluster on Windows, on every "cluster" a new R process is used. There, only the base packages are loaded and the processes don't contain the variables you defined in your global environment. Therefore, you need to export all the variables you use in your function to the clusters. For this, you can use clusterExport:

library("parallel")
cl <- makeCluster(6)
clusterExport(cl, "df_simulate")
res_par <- clusterApply(cl, 1:10000, fun = sum_var)

Here is a small overview and introduction to different parallelisation techniques in R.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for your fix. It works fine. My only concern is that the my CPU seems to run only with 1 logical processors. I'm not sure if the the parallel computing is actually conducted. imgur.com/a/DwNuxNV

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.