Speeding up while loop nested in a for loop in R

Question

In R I have the following example module which repeats a for loop n times:

function(n){
#inputs - n - number of results required
    #reserve n spaces for results
    r_num_successes <- 1:n

    #start looping n times
    for(i in 1:n){

        #set first uniform "random" deviate equal to 0.05 and number of successes to 0
        current_unif <- 0.05
        num_successes <- 0

        #start while loop that updates current_unif - it runs as long as 
        #current_unif is less than 0.95, increments num_successes each loop
        while(current_unif < 0.95){

            #set current_unif to a uniform random deviate between the
            #existing current_unif and 1
            current_unif <- runif(1,current_unif)
            num_successes <- num_successes + 1
        }

        #set the i-th element of the results vector to that final num_successes
        #generated by the while loop
        r_num_successes[i] <- num_successes
    }

            #output the mean of all the successes
    return(mean(r_num_successes))
}

When n gets big, this starts to grind pretty slowly. Is there a good way to optimise it?

Please describe in words what this function is supposed to do. It seems like you are simply taking n draws from a uniform sample with 5% success rate. — Andrie
– Andrie, Commented Apr 27, 2012 at 16:45
@Andrie Clearly he's trying to come up with a method of measuring the distribution of runif that's worthy of submission to thedailywtf :-) — Carl Witthoft
– Carl Witthoft, Commented Apr 27, 2012 at 16:49
@CarlWitthoft Ahah. And there I was, thinking this is a new entry for the most obfuscated random sampler competition... — Andrie
– Andrie, Commented Apr 27, 2012 at 16:50
@dplanet I see you have added comments to your code. That simply adds comments to pretty pointless code. Please describe in words what you are trying to do. — Andrie
– Andrie, Commented Apr 27, 2012 at 16:54
@CarlWitthoft - I lolled so +1. However, this is just a simpler example of a function I'm making which has a lot more lines, hence the need to optimise. I've commented the script - I find it hard to explain because it doesn't have any use except demonstrating my point. — dplanet
– dplanet, Commented Apr 27, 2012 at 16:55

Joshua Ulrich · Accepted Answer · 2012-04-27 18:26:36Z

10

There's nothing you can do to significantly improve the speed of this using pure R. Byte-compiling will give you a small improvement, but you will need to move to compiled code for any significant speed gains.

UPDATE: Here's a Rcpp solution, just for Dirk :)

> nCode <- '
+   int N = as<int>(n);
+   std::vector<double> rns;
+ 
+   RNGScope scope;  // Initialize Random number generator
+ 
+   for(int i=0; i<N; i++) {
+     double current_unif = 0.05;
+     double num_successes = 0;
+     while(current_unif < 0.95) {
+       current_unif = ::Rf_runif(current_unif, 1.0);
+       num_successes++;
+     }
+     rns.push_back(num_successes);
+   }
+ 
+   double mean = std::accumulate(rns.begin(), rns.end(), 0.0) / rns.size();
+   return wrap(mean);  // Return to R
+ '
>
> library(inline)
> nFunRcpp <- cxxfunction(signature(n="int"), nCode, plugin="Rcpp")
> library(compiler)
> nFunCmp <- cmpfun(nFun)
> system.time(nFun(1e5))
   user  system elapsed 
  3.100   0.000   3.098 
> system.time(nFunCmp(1e5))
   user  system elapsed 
  2.120   0.000   2.114 
> system.time(nFunRcpp(1e5))
   user  system elapsed 
  0.010   0.000   0.016

edited Apr 27, 2012 at 18:26

answered Apr 27, 2012 at 18:09

Joshua Ulrich

177k33 gold badges357 silver badges429 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Dirk is no longer here Over a year ago

Holy schmoly, Josh posting an Rcpp solution? Beyond awesome. Only meek sugesstion I have: use rbenchmark::benchmark() for the timing. Oh, and maybe preallocate the vector. But very nice use of STL to compute mean. Colour me impressed :)

Joshua Ulrich Over a year ago

@DirkEddelbuettel: Thanks. My "very nice use of STL to compute mean" was the result of searching SO. ;) I didn't see reason to use rbenchmark because there's no contest in this case. I also swiped a bit of code from your blog to get started. :)

Dirk is no longer here · Accepted Answer · 2012-04-27 22:53:16Z

2

Just for completeness, here is what I suggested to @JoshuaUlrich:

R> res <- benchmark(nFun(1e5L), nFunCmp(1e5L), nFunRcpp(1e5L), nFun2Rcpp(1e5L),
+                  columns = c("test", "replications", "elapsed", "relative"),
+                  replications=10,
+                  order="relative")
R> print(res)
               test replications elapsed  relative
4 nFun2Rcpp(100000)           10   0.117   1.00000
3  nFunRcpp(100000)           10   0.122   1.04274
2   nFunCmp(100000)           10  13.845 118.33333
1      nFun(100000)           10  23.212 198.39316
R>

nFun2Rcpp simply adds one line:

rns.reserve(N);

and changes the assignment to

rns[i] = num_successes;

rather than using .push_back() which makes the memory allocation a tiny bit more efficient.

Edit Turns out that was inaccurate and a reflection of the randomized algorithm. If I add a set set.seed() to each, times are identical between the two C++ versions. No measurable gain here.

edited Apr 27, 2012 at 22:53

answered Apr 27, 2012 at 19:14

Dirk is no longer here

370k60 gold badges668 silver badges742 bronze badges

1 Comment

Joshua Ulrich Over a year ago

You're right, it is a lot better with ratios from rbenchmark. :)

Collectives™ on Stack Overflow

Speeding up while loop nested in a for loop in R

2 Answers 2

2 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related