How to optimize nested for Loop R

Question

I am trying to optimize this nested for loop, which takes the min of 2 numbers, and then adds the result to the dataframe. I was able to cut it down significantly using vectorizing and initializing, but I'm not too sure how to apply that logic to a nested for loop. Is there a quick way to make this run faster? Sitting on over 5 hours of run time.

"Simulation" has 100k values, and "limits" has 5427 values

output <- data.frame(matrix(nrow = nrow(simulation),ncol = nrow(limits)))
res <- character(nrow(simulation))

for(i in 1:nrow(limits)){
    for(j in 1:nrow(simulation)){
        res[j] <- min(limits[i,1],simulation[j,1])
    }
    output[,i] <- res
}

edit*

dput(head(simulation))
    structure(list(simulation = c(124786.7479,269057.2118,80432.47896,119513.0161,660840.5843,190983.7893)), .Names = "simulation", row.names = c(NA,6L), class = "data.frame")

dput(head(limits))
    structure(list(limits = c(5000L,10000L,20000L,25000L,30000L)), .Names = "limits", row.names = c(NA, 6L), class = "data.frame")

Take a look at the apply family, I think lapply would work in your situation. It can effectively replace for and operates more rapidly (or so I've found and read of others finding). Also, can we get a dput(head(simulation)) and dput(head(limits))? So we can see the structure of the data? If you're fully vectorized sapply may get the job done (I'm not great with it though). — Badger
– Badger, Commented Sep 27, 2017 at 22:20
You're doing 542 Million calculations. What on earth are you going to do with the resulting output matrix? — thelatemail
– thelatemail, Commented Sep 27, 2017 at 22:39
@thelatemail calculating limited variance/std. dev for a complicated distribution, no good formula to just calculate theoretical values so we are using a simulation — Learning_R
– Learning_R, Commented Sep 27, 2017 at 22:43

BrodieG · Accepted Answer · 2017-09-27 22:56:01Z

1

If you have >15GB in RAM (~100K * 5500 * 8 bytes per number * 3 (result + outer x vals + outer y vals)) you can try:

outer(simulation[[1]], limits[[1]], pmin)

Although in reality you'll probably need more than 15GB because I think pmin will duplicate stuff even more. If you don't have the ram you'll have to break up the problem (e.g. rely on code that does a column at a time or some such).

answered Sep 27, 2017 at 22:56

BrodieG

52.8k9 gold badges99 silver badges148 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

F. Privé · Accepted Answer · 2017-09-28 06:30:03Z

Basically, when you have a double-loop, it is often useful to use Rcpp.

Moreover, I will use package bigstatsr to save you some RAM. You can create and access matrices that are stored on your disk.

So, you can do:

simulation <- structure(list(simulation = c(124786.7479,269057.2118,80432.47896,119513.0161,660840.5843,190983.7893)), .Names = "simulation", row.names = c(NA,6L), class = "data.frame")
limits <- structure(list(limits = c(5000L,10000L,15000L, 20000L,25000L,30000L)), .Names = "limits", row.names = c(NA, 6L), class = "data.frame")

library(bigstatsr)
# Create the filebacked matrix on disk (in `/tmp/` by default)
mat <- FBM(nrow(simulation), nrow(limits))
# Fill this matrix in Rcpp
Rcpp::sourceCpp('fill-FBM.cpp')
fillMat(mat, limits[[1]], simulation[[1]])  
# Access the whole matrix in RAM to verify
# or you could access only block of columns
mat[]
mat[, 1:3]

where 'fill-FBM.cpp' is

// [[Rcpp::depends(bigstatsr, BH)]]
#include <bigstatsr/BMAcc.h>
#include <Rcpp.h>
using namespace Rcpp;


// [[Rcpp::export]]
void fillMat(Environment BM,
             const NumericVector& limits,
             const NumericVector& simulation) {

  XPtr<FBM> xpBM = BM["address"];
  BMAcc<double> macc(xpBM);

  int n = macc.nrow();
  int m = macc.ncol();

  for (int i = 0; i < m; i++)
    for (int j = 0; j < n; j++)
      macc(j, i) = std::min(limits[i], simulation[j]);
}

Collectives™ on Stack Overflow

How to optimize nested for Loop R

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related