0

I am trying to run simulation scenarios which in turn should provide me with the best scenario for a given date, back tested a couple of months. The input for a specific scenario has 4 input variables with each of the variables being able to be in 5 states (625 permutations). The flow of the model is as follows:

  1. Simulate 625 scenarios to get each of their profit
  2. Rank each of the scenarios according to their profit
  3. Repeat the process through a 1-day expanding window for the last 2 months starting on the 1st Dec 2015 - creating a time series of ranks for each of the 625 scenarios

The unfortunate result for this is 5 nested for loops which can take extremely long to run. I had a look at the foreach package, but I am concerned around how the combining of the outputs will work in my scenario.

The current code that I am using works as follows, first I create the possible states of each of the inputs along with the window

a<-seq(as.Date("2015-12-01", "%Y-%m-%d"),as.Date(Sys.Date()-1, "%Y-%m-%d"),by="day")
#input variables
b<-seq(1,5,1)
c<-seq(1,5,1)
d<-seq(1,5,1)
e<-seq(1,5,1)

set.seed(3142)

tot_results<-NULL

Next the nested for loops proceed to run through the simulations for me.

for(i in 1:length(a))
{
cat(paste0("\n","Current estimation date: ", a[i]),";itteration:",i," \n")
#subset data for backtesting
dataset_calc<-dataset[which(dataset$Date<=a[i]),]
p=1
results<-data.frame(rep(NA,625))
    for(j in 1:length(b))
    {
        for(k in 1:length(c))
        {
            for(l in 1:length(d))
            {
                for(m in 1:length(e))
                {
                if(i==1)
                {
                    #create a unique ID to merge onto later
                    unique_ID<-paste0(replicate(1, paste(sample(LETTERS, 5, replace=TRUE), collapse="")),round(runif(n=1,min=1,max=1000000)))
                }
                #Run profit calculation
                post_sim_results<-profit_calc(dataset_calc, param1=e[m],param2=d[l],param3=c[k],param4=b[j])
                #Exctract the final profit amount
                profit<-round(post_sim_results[nrow(post_sim_results),],2)

                results[p,]<-data.frame(unique_ID,profit)
                p=p+1
                }
            }
        }
    }
    #extract the ranks for all scenarios
    rank<-rank(results$profit)

    #bind the ranks for the expanding window
    if(i==1)
        {
            tot_results<-data.frame(ID=results[,1],rank)
        }else{
            tot_results<-cbind(tot_results,rank)
        }
    suppressMessages(gc())
}

My biggest concern is the binding of the results given that the outer loop's actions are dependent on the output of the inner loops.

Any advice on how proceed would greatly be appreciated.

2
  • 2
    From your code, it seems as though you can just vectorize the whole thing? With expand.grid(a,b,c,d,e) as your input. Commented Feb 3, 2016 at 15:07
  • Thank you so much for your helpful comment. My background is unfortunately not programming based, so although I understand the concept of 'vectorizing' the problem, I have not implemented these structures before. Do you know of a good source perhaps where I could go look at some examples? Or perhaps if you could be so kind as to provide a sample of say a nested loop with 3 for functions which I could just build up from Commented Feb 3, 2016 at 16:21

1 Answer 1

1

So I think that you can vectorize most of this, which should give a big reduction in run time.

Currently, you use for-loops (5, to be exact) to create every combination of values, and then run the values one by one through profit_calc (a function that is not specified). Ideally, you'd just take all possible combinations in one go and push them through profit_calc in one single operation.

-- Rationale --

a <- 1:10
b <- 1:10
d <- rep(NA,10)
for (i in seq(a)) d[i] <- a[i] * b[i]
d 

# [1]   1   4   9  16  25  36  49  64  81 100

Since * also works on vectors, we can rewrite this to:

a <- 1:10
b <- 1:10
d <- a*b
d

# [1]   1   4   9  16  25  36  49  64  81 100

While it may save us only one line of code, it actually reduces the problem from 10 steps to 1 step.

-- Application --

So how does that apply to your code? Well, given that we can vectorize profit_calc, you can basically generate a data frame where each row is every possible combination of your parameters. We can do this with expand.grid:

foo <- expand.grid(b,c,d,e)
head(foo)

#   Var1 Var2 Var3 Var4
# 1    1    1    1    1
# 2    2    1    1    1
# 3    3    1    1    1
# 4    4    1    1    1
# 5    5    1    1    1
# 6    1    2    1    1

Lets say we have a formula... (a - b) / (c + d)... Then it would work like:

bar <- (foo[,1] - foo[,2]) * (foo[,3] + foo[,4])
head(bar)

# [1]  0  2  4  6  8 -2

So basically, try to find a way to replace for-loops with vectorized options. If you cannot vectorize something, try looking into apply instead, as that can also save you some time in most cases. If your code is running too slow, you'd ideally first see if you can write a more efficient script. Also, you may be interested in the microbenchmark library, or ?system.time.

Sign up to request clarification or add additional context in comments.

1 Comment

thanks for this concise and intuitive explanation! Really appreciate it

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.