Best way to name objects programmatically using R?

Question

I'm running various modeling algorithms on a data set. I've had best results by modeling my input variables to my responses one at a time, e.g.:

model <- train(y ~ x1 + x2 + ... + xn, ...)

Once I train my models, I'd like to not re-run them each time, so I've been trying to save them as .rda files. Here's an example loop for a random forest model (feel free to suggest a better way than a loop!):

# data_resp contains my measured responses, one per column
# data_pred contains my predictors, one per column

for (i in 1:ncol(data_resp)) {

  model <- train(data_pred_scale[!is.na(data_resp[, i]), ],
                 data_resp[!is.na(data_resp[, i]), i],
                 method = "rf",
                 tuneGrid = data.frame(.mtry = c(3:6)),
                 nodesize = 3,
                 ntrees = 500)

  save(model, file = paste("./models/model_rf_", names(data_resp)[i], ".rda", sep = ""))

When I load the model, however, it's going to be called model.

I haven't found a good way to save the model with it's corresponding name to try and refer back to it later. I found that one can assign an object to a string like so:

assign(paste("./models/model_rf_", names(data_resp)[i], ".rda", sep = ""), train(...))

But I'm still left with how to refer to the object when I save it:

save(???, file = ...)

I don't know how to call the object by it's custom name.

Lastly, even loading has presented a problem. I've tried assign("model_name", load("./model.rda")), but the resultant object, called string ends up just holding a string of the object name, "model".

In looking around, I found THIS question, which seems relevant, but I'm trying to figure out how to apply it to my situation.

I could create a list with the names of each column name in data_resp (my measured responses) and then use lapply to use train(), but I'm still a bit stuck on how to dynamically refer to the new object name to keep the resultant model in.

Your specific question about save() could have been answered by simply reading the documentation and noting the second argument. — joran
– joran, Commented Jul 21, 2013 at 5:20
Try to use saveRDS and readRDS in that case, it'll simplify a lot of things. — dickoa
– dickoa, Commented Jul 21, 2013 at 6:02

Spacedman · Accepted Answer · 2013-07-21 07:55:12Z

When you save the model, save another object called 'name' which is a character string of the thing you want to name it as:

> d=data.frame(x=1:10,y=rnorm(10))
> model=lm(y~x,data=d)
> name="m1"
> save(model,name,file="save1.rda")
> d=data.frame(x=1:10,y=rnorm(10))
> model=lm(y~x,data=d)
> name="m2"
> save(model,name,file="save2.rda")

Now each file knows what it wants its resulting object to be called. How do you get that back on load? Load into a new environment, and assign:

> e=new.env()
> load("save1.rda",env=e)
> assign(e$name,e$model)
> summary(m1)

Call:
lm(formula = y ~ x, data = d)

You can now safely rm or re-use the 'e' object. You can of course wrap this in a function:

> blargh=function(f){e=new.env();load(f,env=e);assign(e$name,e$model,.GlobalEnv)}
> blargh("save2.rda")
> m2

Call:
lm(formula = y ~ x, data = d)

Note this is a double bad thing to do - firstly, you should probably store all the models in one file as a list with names. Secondly, this function has side effects, and if you had an object called m2 already it would get stomped on.

Using assign like this is nearly always a sign (dyswidt?) that you should use a list instead.

B

vaettchen · Accepted Answer · 2013-07-21 06:18:00Z

There is a fair amount of guesswork involved in this answer but I think this could help:

# get a vector with the column names in data_resp
modNames <- colnames( data_resp )

# create empty list
models <- as.list( NULL )

# iterate through your columns and assign the result as list members
for( n in modNames )
{
  models[[n]] <- train(data_pred_scale[!is.na(data_resp[, n]), ],  ### this may need modification, can't test without data
                 data_resp[!is.na(data_resp[, n]), n],
                 method = "rf",
                 tuneGrid = data.frame(.mtry = c(3:6)),
                 nodesize = 3,
                 ntrees = 500)
}

# save the whole bunch
save( models, file = "models.rda" )

You can now retrieve, just with load( "models.rda ), this one object, the list with all your models, and address them with list notation, either as models[[1]] or with the column name, eg. models[["first"]].

Hendy · Accepted Answer · 2013-07-21 15:21:35Z

I think the other answers about doing this with a loop are great. I used this as a chance to finally try and understand lapply better, as many of the StackOverflow questions about how to do this ended up suggesting the use of lists and lapply instead of loops.

I really like the idea of combining all results of train() into a list (which @vaettchen did in his loop), and in thinking about how to do this with a list, this is what I came up with. First, I needed my data.frame in list form, one entry per column. Since I don't really work with lists, I hunted around until just trying as.list(df), which worked like a charm.

Next, I want to apply my train function to each element of my list of measured response variables, so I defined the function like this:

# predictors are stored in data_pred
# responses are in data_resp (one per column)
# rows in data_pred/data_resp (perhaps obviously) match, one per observation

train_func <- function(y) { train(x = data_pred, y = y,
   method = "rf", tuneGrid = data.frame(.mtry = 3:6),
   ntrees = 500) }

Now I just need to use lapply to apply the train() call on each element of data_resp. I didn't know how to create an empty, placeholder list, so thanks to @vaettchen for that (I was trying list_name <- list() without success):

models <- lapply(as.list(data_resp), train_func)

Awesomely, I found that models has it's elements automatically named to my column names in data_resp, which is just fantastic. I'm using this in conjunction with the shiny package, so this will make it incredibly easy for the user to select a response variable from a drop down (which can store the response variable name) and do:

predict(models[["resp_name"]], new_data)

I think this is much better than the loop based approach and everything just happened to fall in place nicely. I realize the question explicitly asked for naming variables programmatically, so apologies if that pushed others to answer in that fashion vs. a "bigger picture" answer. The ease of lapply suggests I was trying to force a particular solution when a (at least to my eyes) much better one existed.

Bonus: I didn't realize lists could be multi-dimensional, but in trying it, it appears they can be! This is even better, as I'm using numerous algorithms and I can store everything in one big list object.

 func_rf <- function(y) { train(x = data_pred, y = y,
     method = "rf", tuneGrid = data.frame(.mtry = 3),
     ntrees = 100) }

 # svmRadial method requires formula syntax to work with factors,
 # so the train function has to be a bit different
 # add `scale = F` since I had to preProcess the numeric vars ahead of time
 # and cbind to the factors. Without it, caret will try to scale the data
 # for you, which fails for factors

 func_svm <- function(y) { train(y ~ ., cbind(data_pred, y),
     method = "svmRadial", tuneGrid = data.frame(.C = 1, .sigma = .2),
     scale = F) }

 model_list <- list(NULL)
 model_list$rf <- lapply(as.list(data_resp), func_rf)
 model_list$svm <- lapply(as.list(data_resp), func_svm)

Now I can refer the desired model and response variable with list syntax!

 predict(model_list[["svm"]][["response_variable"]], new_data)

Super happy with this and hopefully it makes the code more efficient, faster, and I really love the "meta-object" I end up with vs. a ton of files, one per model/response variable combination, that I have to load in one at a time later on.

does the function above work when using lapply? I'm trying to do the same thing with trainControl, but the parameters don't appear to pass through when running lapply.
@Prophet60091 Can you be a bit more specific? It's been a while since I used this, but I definitely did use it... so yes, it works as shown. Are you trying to create a list of trainControl objects and then iterate through them to see the impact on resultant models? Or just get lapply() to iterate though models/data and bring in a trainControl object as part of the arguments to train()?

Duccio A · Accepted Answer · 2018-04-16 15:28:22Z

0

A bit of an old question but still without an accepted answer.
As I understand, you need to programmatically rename a variable and save it so that when reloaded it keeps the new name.
Try this:

saveWithName = function(var.name, obj){
  # var.name is a string with the name of the variable you want to assign
  # obj is any kind of R object (data.frame, list, etc.) you want to rename and save
  assign(var.name, obj)
  save(list=var.name, file=sprintf("model_%s.RData", var.name))
}

saveWithName("lab1", c(1,2))
saveWithName("lab2", c(3,4))
load("model_lab1.RData")
load("model_lab2.RData")

print(lab1)
#>[1] 1 2
print(lab2)
#[1] 3 4

answered Apr 16, 2018 at 15:28

Duccio A

1,64216 silver badges33 bronze badges

Collectives™ on Stack Overflow

Best way to name objects programmatically using R?

4 Answers 4

Comments

Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related