Create data.frame variables from list

Question

Is it possible to create and assign a name to an object "by reference"? For example, I have a large data.frame and I need to do some basic operations to some of the columns in it. I put the columns, grouping and operations I need to do in lists:

exec_group_list = c("nbhd", "state", "use")
exec_var_list   = c("land", "imp", "assmt", "landp", "impp", "assmtp")
exec_func_list  = c("sum", "mean", "median", "max", "min", "sd")

So, the "land" column, will be grouped by "nbhd" and then the "sum", "mean", "median", etc will be applied to it. Then the same will be done to the "imp" column and so on. Then I will repeat the same but this time the grouping will be done by "state"... rinse, lathe and repeat, as follows:

for (eachg in exec_group_list){
  group_by_field = eachg
  group_by = eval(parse(text=paste("sales$",group_by_field)))
  group_by_lst = list(group_by)
  print(paste("Grouping by:", eachg))
  #CREATE DATA.FRAME FOR GROUP HERE
    for (eachv in exec_var_list){
      var = eval(parse(text=paste("sales$",eachv)))
      print(paste("On column:", eachv))
      for (eachf in exec_func_list){
    print(paste("Calculating:", eachf))
    tempt = (aggregate(var, group_by_lst, eachf))
    colnames(tempt) = c(eachg, paste(eachv,".",eachf, sep=""))
    print(tempt)
    #APPEND COLUMNS TO GROUP DATA.FRAME
      }
    }
  }

I figured out how to use references from a list using eval() so I can loop thru the grouping list and the column list and do the same operations using the values in the list.

But I'd like to store the info in a data.frame named after the grouping field. So for example, if I am grouping by "nbhd" I'd like to create an empty data.frame named "by_nbhd".

I tried something similar to eval(parse(text=paste("by_","nbhd", sep=""))) = data.frame("nbhd"=NA) but I get an error.

Anyone knows if this is possible? Any help will be appreciated. Thank you in advance.

Thanks. I'll look those up. I managed to create data.frames using assign() as shown here link — vic
– vic, Commented Sep 15, 2013 at 16:22
You are really going down the wrong path for R. Instead of creating multiple named dataframes, you should instead be creating one dataframe with multiple columns. Describe your file structure and get help using read.table. — IRTFM
– IRTFM, Commented Sep 15, 2013 at 16:50

IRTFM · Accepted Answer · 2013-09-15 16:28:40Z

1

Rather than asking for "creating an object by reference" which brings up all sorts of extraneous cognitive associations with the distinction between "calling by value" versus "calling by reference", you should be asking for help on "computing on/with the language". Presumably you have a dataset (which you have not described very well) with a set of columns named" "nbhd","state", and "use", and also columns named: "land", "imp", "assmt", "landp", "impp", "assmtp". You want to serial examine summary statistics of 6 sorts within 6 categories of the first group on the numeric columns of the second group (3 x 6 x 6 tables).

Write a prototype of a function that delivers one summary table for a particular function, a particular numeric column, and a particular categorical column.

 tabfn <- function(dfrm, numcol, catcol, fn){
                         tapply(dfrm[[numcol]], dfrm[[catcol]], fn) }

It's easiest to create a list of first class functions rather than eval(parsing(text=character-objects)

exec_func_list  = list(sum, mean, median, max, min, sd)
for (eachg in exec_group_list){
  print(paste("Grouping by:", eachg))
  for (eachv in exec_var_list){
     print(paste("On column:", eachv))
     for (eachfn in exec_func_list){
       print(paste("Calculating:", eachf))
       print(tabfn(dfrm, exec_var_list, exec_group_list, eachfn)
                              }
                               }
                                   }

Unfortunately this is mostly untested guesswork since you have not produces a minimal reproducible example.

answered Sep 15, 2013 at 16:28

IRTFM

264k22 gold badges381 silver badges503 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

vic Over a year ago

I know it's been a while. I used your suggestion and worked great!

calcDF <- function(DFsource,VARname,BYname,FUNname){      pVARname = eval(parse(text=paste(DFsource,"$",VARname,sep="")))      pBYname = eval(parse(text=paste(DFsource,"$",BYname, sep="")))      calcDF = data.frame(aggregate(pVARname,list(pBYname),FUNname))      return(calcDF)}

Thanks for the help.

IRTFM Over a year ago

I would have avoided: eval(parse(text=paste(DFsource,"$",VARname,sep=""))). That's what dfrm[[VARname]] was supposed to do. Much safer.

Collectives™ on Stack Overflow

Create data.frame variables from list

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related