2

I want to parallelize the below code in R. It's a nested for loop.

for (i in 1:nrow(my_dataset_preprocessed)){
    for (j in 1:ncol(my_dataset_preprocessed)){
      my_dataset_preprocessed[i,j] = min( my_dataset_preprocessed[i,j], 0.1 ) 
    }
}

I am trying the below code using doParallel

library(foreach)
library(doParallel)
registerDoParallel(detectCores())
clusterExport(cl, "my_dataset")

threshold_par <- function (X) { 
  co <- foreach(i=1:nrow(X)) %:%
                foreach (j=1:ncol(X)) %dopar% {   
                  co = min( X[i,j], 0.1 )
                }
  matrix(unlist(co), ncol=ncol(X))
}

system.time(threshold_par(my_dataset))

But I am getting the following error:

Error in { : task 1 failed - "invalid 'type' (list) of argument"

Is there any better way to parallelize this code (may be using parLapply)? If not, how do I fix the above code?

2
  • I think lapply(my_dataset_preprocessed, function(x) pmin(x, 0.1)) would be simpler to do. Commented Sep 12, 2017 at 19:17
  • 2
    If your data is a matrix, this should work: my_dataset[my_dataset > 0.1] <- 0.1 Commented Sep 12, 2017 at 19:25

1 Answer 1

1

You didn't declare cl. The following worked if you remove clusterExport(cl, "my_dataset")

library(foreach)
library(doParallel)    
registerDoParallel(detectCores())
getDoParWorkers()
# [1] 8

threshold_par <- function (X) { 
  co <- foreach(i=1:nrow(X)) %:%
                foreach (j=1:ncol(X)) %dopar% {   
                  co = min( X[i,j], 0.1 )
                }
  matrix(unlist(co), ncol=ncol(X))
}

test <- matrix(1:4, ncol=2)
system.time(threshold_par(test))
#      user  system elapsed 
#      0.01    0.00    0.02
Sign up to request clarification or add additional context in comments.

6 Comments

I did declare cl before, that's not the issue. even if I remove that line, it's throwing the same error
Ok. btw, I started a new R session, and it worked again without error...so not sure why there's a difference. Could you try your same code with %do% instead of %dopar%?
%do% gives me the same error. I tried your matrix and it ran properly. So, the issue is with the structure then. Below is the traceback of the error: 5 stop(simpleError(msg, call = expr)) 4 e$fun(obj, substitute(ex), parent.frame(), e$data) 3 foreach(i = 1:nrow(X)) %:% foreach(j = 1:ncol(X)) %dopar% { co = min(X[i, j], 0.1) } 2 threshold_par(test) 1 system.time(threshold_par(test))
This is why you should try to provide data that reproduces your error. But sounds like you've pinpointed the trouble.
I have solved it. I unlisted the unlisted the matrix and built the matrix again to reset the column indexes. Now it's working fine
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.