0

Still relatively new to R. Trying to have dynamic variables in a loop but running into all sorts of problems. Initial code looks something like this (but bigger)

data.train$Pclass_F <- as.factor(data.train$Pclass)
data.test$Pclass_F <- as.factor(data.test$Pclass)

which I'm trying to build into a loop, imagining something like this

datalist <- c("data.train", "data.test")
for (i in datalist){
  i$Pclass_F <- as.factor(i$Pclass)
}

which doesn't work. A little research implies that inorder to convert the string datalist into a variable I need to use the get function. So my next attempt was

datalist <- c("data.train", "data.test")
for (i in datalist){
  get(i$Pclass_F) <- as.factor(get(i$Pclass))
}

which still doesn't work Error in i$Pclass : $ operator is invalid for atomic vectors. Tried

datalist <- c("data.train", "data.test")
for (i in datalist){
  get(i)$Pclass_F <- as.factor(get(i)$Pclass)
}

which still doesn't work Error in get(i)$Pclass_F <- as.factor(get(i)$Pclass) : could not find function "get<-". Even tried

datalist <- c("data.train", "data.test")
for (i in datalist){
  get(i[Pclass_F]) <- as.factor(get(i[Pclass]))
}

which still doesn't work Error in get(i[Pclass]) : object 'Pclass' not found. The tried

datalist <- c("data.train", "data.test")
for (i in datalist){
  get(i)[Pclass_F] <- as.factor(get(i)[Pclass])
}

which still doesn't work Error in '[.data.frame'(get(i), Pclass) : object 'Pclass' not found

Now realized I never included data so nobody can run this themselves, but just to show it's not a data problem

> class(data.train$Pclass)
[1] "integer"
> class(data.test$Pclass)
[1] "integer"
> datalist
[1] "data.train" "data.test" 
4
  • 8
    Stop structuring your code around get. Ban it from your lexicon until you are much better at R. Anytime you think "dynamic variable", stop and build a named list. Put them in an actual list and using indexing: L <- list(data.train = data.train,data.test = data.test). Commented Jun 29, 2015 at 16:06
  • 1
    I agree this is terrible coding. Not the first time at all I see this on SO. I really wonder where these bad habits come from, and how they got "intuitive" and "natural" for the OP. I think it's an intersting point (and the main one) to resolve. Commented Jun 29, 2015 at 16:09
  • 2
    @ColonelBeauvel I wouldn't be too hard on the OP. Being a beginner generally means just trying to find something that "works", and you learn about whether it's "good" in the process. I find this particular problem frustrating because it comes up to frequently but each person's use case is so different that it is very difficult to write a single comprehensive resource that effectively steers people away from get and assign. Commented Jun 29, 2015 at 16:21
  • I asked the same question a few years ago: stackoverflow.com/questions/15959027/… Commented Jun 29, 2015 at 17:45

1 Answer 1

4

The problem you have relates to the way data frames and most other objects are treated in R. In many programming languages, objects are (or at least can be) passed to functions by reference. In C++ if I pass a pointer to an object to a function which manipulates that object, the original is modified. This is not the way things work for the most part in R.

When an object is created like this:

x <- list(a = 5, b = 9)

And then copied like this:

y <- x

Initially y and x will point to the same object in RAM. But as soon as y is modified at all, a copy is created. So assigning y$c <- 12 has no effect on x.

get() doesn't return the named object in a way that can be modified without first assigning it to another variable (which would mean the original variable is left unaltered).

The correct way of doing this in R is storing your data frames in a named list. You can then loop through the list and use the replacement syntax to change the columns.

datalist <- list(data.train = data.train, data.test = data.test)
for (df in names(datalist)){
  datalist[[df]]$Pclass_F <- as.factor(datalist[[df]]$Pclass_F)
}

You could also use:

datalist <- setNames(lapply(list(data.train, data.test), function(data) {
  data$Pclass_Fb <- as.factor(data$Pclass_Fb)
  data
}), c("data.train", "data.test"))

This is using lapply to process each member of the list, returning a new list with the modified columns.

In theory, you could achieve what you were originally trying to do by using the [[ operator on the global environment, but it would be an unconventional way of doing things and may lead to confusion later on.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks @Nick K I had come up with something similar myself but in the first example it is adding the column to the dataframes in the list and not the actual dataframes themselves. If I look at head(data.train) I do not see the new column but if I look at head(datalist[[1]]) I can.
@SC. that's the point I was trying to get across. Short of using [[ on the global environment, there isn't a way of doing what you want to achieve with data.frames in the global environment. You need to store them in a list or environment, and in general I would use a list for this use case. You can then still refer to them as datalist$data.train and use them in loops etc as shown here. Alternatively, you could of course just use separate statements for each data.frame (as at the beginning of your answer), and if you're processing more than one column out that piece of code in a function.
Thanks @Nick. I guess that's what @joran was also saying but I didn't understand. So in this scenario the list datalist becomes my main data structure. That's an interesting concept I hadn't thought of. It will make some names rather long datalist$data.train$Pclass etc but it should work. I assume this is best practice, and a good habit for me to pick up?
@SC. it is if you want to be able to work on items within the list in a loop. The same kinds of structures are essential for returning more than one value from a function.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.