R - renaming multiple columns in multiple dataframes, using nested loop

Question

I have 29 data frames, named Student1 to Student 29. Each of these 29 data frames contains variables Name, Nationality and Membership.number, each with the corresponding number at the end. e.g. Student1 contains Name.1, Nationality.1 and Membership.number.1, Student29 contains Name.29 etc.

I'm trying to standardise these by stripping out the numbers at the end of these variable names. I'm very new to R, but I've put together the following code to try and automate this.

for (j in 1:29) {
 for (i in 1:3) {
    oldnames = c(paste('Name', i, sep="."), paste('Nationality', i, sep="."), paste('Membership.number', i, sep="."))
    newnames = c("Name", "Nationality", "Membership.number")
    names(paste("Student",j,sep=""))[names(paste("Student",j,sep=""))==oldnames[i]]=newnames[i]
  }
}

This appears to be close to achieving what I want, and works as it should for a single dataframe if I insert Student1 in place of paste("Student",j,sep=""), but the paste ("Student",j,sep="") code seems to be failing due to "target of assignment expands to non-language object". Is there something simple which I'm doing wrong here?

K. A. Buhr · Accepted Answer · 2017-06-05 13:41:02Z

The problem is that paste() returns a string, so your code is effectively doing things like:

names("Student1")[names("Student1")==oldnames[i]] = newnames[i]

but, of course, the string "Student1" isn't the same as the variable Student1 that contains your data frame, so this doesn't get you very far. The error message is a little confusing but ultimately means that you're trying to assign to something that can't be assigned to.

The simplest solution is to make use of the functions get() and assign() which take a string naming a variable (like the string "Student1") and allow you to retrieve and assign the variable. For example, this will rename one of the columns of Student1:

dfname = "Student1"
df = get(dfname)
names(df)[names(df)=="Name.1"] = "Name"
assign(dfname, df)

So, you can write:

for (j in 1:29) {
    oldnames = c(paste('Name', j, sep="."), 
                 paste('Nationality', j, sep="."),
                 paste('Membership.number', j, sep="."))
    newnames = c("Name", "Nationality", "Membership.number")
    dfname = paste("Student", j, sep="")
    df = get(dfname)
    for (i in 1:3) {
        names(df)[names(df) == oldnames[i]] = newnames[i]
    }
    assign(dfname, df)
}

Note that I fixed the oldnames definition to use j instead of i and moved the definitions that depended only on j out of the inner loop. One caveat here is that this only works at "top level" (i.e., entered at the R prompt). If you put it in a function, then assign() gets trickier because you need to specify where you want the variable assigned (at the top level with the rest of the global variables, within the function, etc.).

This code can still be improved. It turns out that your definition of oldnames can be rewritten as:

oldnames = paste(c("Name","Nationality","Membership.number"), j, sep=".")

which means that you can actually write:

newnames = c("Name","Nationality","Membership.number")
oldnames = paste(newnames, j, sep=".")

You can go one step further and use the function match. This function gets the index of each of the elements of its first argument within its second argument and can be used to retrieve the positions of all the oldnames in the names() vector simultaneously. Then, you don't even need the inner loop:

for (j in 1:29) {
    newnames = c("Name","Nationality","Membership.number")
    oldnames = paste(newnames, j, sep=".")
    dfname = paste("Student", j, sep="")
    df = get(dfname)
    names(df)[match(oldnames, names(df))] = newnames
    assign(dfname, df)
}

This sort of use of match to find and replace values in a vector is a very common R technique.

Finally, if there aren't any other columns in the data frames (so you really just want to remove all suffixes that consist of a period and some digits from the end of all names), then a common trick in R is to use sub() to modify the names using regular expressions:

for (j in 1:29) {
    newnames = c("Name","Nationality","Membership.number")
    oldnames = paste(newnames, j, sep=".")
    dfname = paste("Student", j, sep="")
    df = get(dfname)
    names(df) = sub("\\.[0-9]+$", "", names(df))
    assign(dfname, df)
}

Note that, in R, backslashes in regular expressions need to be doubled up, so the above "\\." will match a period. I use this sub-based technique all the time when cleaning up datasets that have unwanted prefixes and suffixes on a bunch of column names.

Happy R-ing!

This is terrific, thank you - as well as an answer to my question, some useful additional tips and fixes to clean up my code. I'm only on my second day of R, so this will help clear some of the rust away! Much appreciated.

Collectives™ on Stack Overflow

R - renaming multiple columns in multiple dataframes, using nested loop

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related