Calculating R function with arguments which are columns of a data frame

Question

I have the following code.

completemodel <- function(model, colnum)
{
  modlst = c()
  tuplenum = length(model)
  if(tuplenum != 0)
    for(i in 1:tuplenum)
      modlst = c(modlst, model[[i]])
  index = seq(0, colnum-1)
  inddiff = setdiff(index, modlst)
  inddifflen = length(inddiff)
  for(i in seq(length.out=inddifflen))
    model = append(model, inddiff[i])
  return(model)
}

##   Calculate number of parameters in model.
numparam <- function(mod, colnum)
  {
    library(RJSONIO)
    mod = fromJSON(mod)
    mod = completemodel(mod, colnum)
    totnum = 0
    for(tup in mod)
      totnum = totnum +(4**length(tup))
    return(totnum)
  }

x = cbind.data.frame(rownum=c(100, 100), colnum=c(10, 20), modeltrue=c("[]", "[]"), modelresult=c("[[1,2]]","[[1,3]]"), stringsAsFactors=FALSE)

> x
  rows colnum modeltrue modelresult
1  100     10        []     [[1,2]]
2  100     20        []     [[1,3]]

How can I operate on x to give me a data frame that looks like the following? Here of course I mean that the value of e.g. numparam("[]", 10) when I write numparam("[]", 10).

  rownum   colnum    numparam_modeltrue   numparam_modelresult
  100        10      numparam("[]", 10)   numparam("[[1,2]]", 10)
  100        20      numparam("[]", 20)   numparam("[[1,3]]", 20)

Some version of the apply function might work, but I am having problems finding the proper formulation.

UPDATE: It seems that if the rownnum, colnum tuple is not unique, then one can do the following.

x = cbind.data.frame(id=c(1, 2, 3), rownum=c(100, 100, 100), colnum=c(10, 20, 20), modeltrue=c("[]", "[]", "[]"),
  modelresult=c("[[1,2]]","[[1,3]]","[[1,3, 4]]"), stringsAsFactors = FALSE)

##Then, create a data.table and set the key

library(data.table)
xDT <- as.data.table(x)
setkeyv(xDT, c("id", "rownum", "colnum")

Is that the correct method?

@RomanLuštrik: I'd be happy to, but what kind of context do you need? The code given above is complete, I think. I just want to operate on the given data frame with the numparam function to obtain another data frame in the manner speciried. What is unclear? This is the actual code I am using. I suppose I could come up with a simpler example to illustrate, though this one is not very complex. — Faheem Mitha
– Faheem Mitha, Commented Oct 9, 2012 at 8:01
numparam_modeltrue and numparam_modelresult are factors? — Roman Luštrik
– Roman Luštrik, Commented Oct 9, 2012 at 10:18
@RomanLuštrik: No, just strings. I'm modified the call to cbind. — Faheem Mitha
– Faheem Mitha, Commented Oct 9, 2012 at 10:27

BenBarnes · Accepted Answer · 2012-10-10 10:40:12Z

3

If you're open to it, you could use the data.table package.

First, create a data.table, add a unique identifier column id and set that as the key

library(data.table)
xDT <- as.data.table(x)
xDT[, id := seq_len(nrow(xDT))]
setkey(xDT, "id")

Then, using do.call, you can run your numparam function on the appropriate columns:

res1 <- xDT[, list(numparam_modeltrue = do.call(numparam, unname(.SD))),
  .SDcols = c(3, 2), by = key(xDT)]
res2 <- xDT[, list(numparam_modelresult = do.call(numparam, unname(.SD))),
  .SDcols = c(4, 2), by = key(xDT)]

Then combine the results into a data.table

xDT[res1][res2][, c("modeltrue", "modelresult") := NULL, with = FALSE]
   id rownum colnum numparam_modeltrue numparam_modelresult
1:  1    100     10                 40                   48
2:  2    100     20                 80                   88

EDIT:

As Matthew Dowle suggests, you could reach the same results without the mrege at the end by the following:

xDT[,numparam_modeltrue := do.call(numparam, unname(.SD)),
  .SDcols = c(3, 2), by = key(xDT)]
xDT[,numparam_modelresult := do.call(numparam, unname(.SD)),
  .SDcols = c(4, 2), by = key(xDT)]

And if you want to get rid of the columns modeltrue and modelresult,

xDT[,c("modeltrue", "modelresult") := NULL, with = FALSE]
# NOTE that with = FALSE shouldn't be necessary with data.table 1.8.3
# But I'm still with 1.8.2

edited Oct 10, 2012 at 10:40

answered Oct 9, 2012 at 10:46

BenBarnes

19.5k6 gold badges60 silver badges75 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

Matt Dowle Over a year ago

+1 Could the res1<- and res2<- steps each be done in one := by group; e.g., xDT[, numparam_modeltrue := do.call(numparam, unname(.SD)), .SDcols = c(3, 2), by = key(xDT)] directly to save the res1[res2]?

BenBarnes Over a year ago

@MatthewDowle, That would be a possibility, but the two functions refer to different sets of columns, and I defined .SDcols to correspond appropriately. An attempt at subsetting .SD didn't work out yet...

Matt Dowle Over a year ago

I edited my comment a few times, apols. I mean two :=-by-group, and no res1[res2].

BenBarnes Over a year ago

Ah yes. Added your suggested alternative, @MatthewDowle

Matt Dowle Over a year ago

Cool. Nice to see you're up to speed with 1.8.3.

|

BenBarnes · Accepted Answer · 2012-10-12 20:18:21Z

1

Alternative approach using sapply:

numparamvec <- function(rownum, colnum, modeltrue, modelresult)
  {
    totnum1 = numparam(modeltrue, as.integer(colnum))
    totnum2 = numparam(modelresult, as.integer(colnum))
    return(c(rownum = rownum, colnum = colnum,
      numparam_modeltrue = totnum1, numparam_modelresult = totnum2))
  }

val <- sapply(seq_len(nrow(x)),
  function(y) do.call(numparamvec, x[y, ]))

> as.data.frame(t(val))
  rownum colnum numparam_modeltrue numparam_modelresult
1    100     10                 40                   48
2    100     20                 80                   88

Alternative approach using vapply:

val <- t(vapply(seq_len(nrow(x)), function(y) do.call(numparamvec, x[y, ]),
  c(rownum = 0, colnum = 0, numparam_modeltrue = 0, numparam_modelresult = 0)))

> val
     rownum colnum numparam_modeltrue numparam_modelresult
[1,]    100     10                 40                   48
[2,]    100     20                 80                   88

edited Oct 12, 2012 at 20:18

answered Oct 11, 2012 at 12:09

BenBarnes

19.5k6 gold badges60 silver badges75 bronze badges

2 Comments

Faheem Mitha Over a year ago

Thanks for the update. I suggest merging your edit to my answer with this answer (and removing it from there), since they are so similar. I think I slightly prefer the vapply version, because, if I understand correctly, it does some validation on the input.

BenBarnes Over a year ago

@FaheemMitha, Good Suggestion. Changes made.

Community · Accepted Answer · 2017-05-23 11:48:09Z

1

The following code sort of works. It is not very pretty, though. Suggestions for improvement welcome. In particular, it would be nice to not have to transpose the matrix and add the column names, and also, since it returns a matrix, there is still that annoying issue where the integers are converted to strings. Thanks to flodel for the tip regarding his answer to "Pass arguments to a function from each row of a matrix".

completemodel <- function(model, colnum)
{
  modlst = c()
  tuplenum = length(model)
  if(tuplenum != 0)
    for(i in 1:tuplenum)
      modlst = c(modlst, model[[i]])
  index = seq(0, colnum-1)
  inddiff = setdiff(index, modlst)
  inddifflen = length(inddiff)
  for(i in seq(length.out=inddifflen))
    model = append(model, inddiff[i])
  return(model)
}

##   Calculate number of parameters in model.
numparam <- function(mod, colnum)
  {
    library(RJSONIO)
    mod = fromJSON(mod)
    print(paste("mod", mod))
    mod = completemodel(mod, colnum)
    totnum = 0
    for(tup in mod)
      totnum = totnum +(4**length(tup))
    return(totnum)
  }

numparamvec <- function(rownum, colnum, modeltrue, modelresult)
  {
    totnum1 = numparam(modeltrue, as.integer(colnum))
    totnum2 = numparam(modelresult, as.integer(colnum))
    return(c(rownum, colnum, totnum1, totnum2))
  }

x = cbind.data.frame(rownum=c(100, 100), colnum=c(10, 20), modeltrue=c("[]", "[]"), modelresult=c("[[1,2]]","[[1,3]]"), stringsAsFactors=FALSE)
val = t(apply(x, 1, function(x)do.call(numparamvec, as.list(x))))
colnames(val) = c("rownum", "colnum", "numparam_modeltrue", "numparam_modelresult")

edited May 23, 2017 at 11:48

CommunityBot

11 silver badge

answered Oct 10, 2012 at 9:24

Faheem Mitha

6,3768 gold badges51 silver badges85 bronze badges

5 Comments

Faheem Mitha Over a year ago

@BenBarnes: The resulting data frame, however, has integers as strings. Should one just run a converter over the data frame, or is there a better way to handle this?

BenBarnes Over a year ago

The problem with conversion of the integers to character happens when using the apply function, which calls as.matrix on 2-D objects. If there are any non-numeric, -complex, or -logical data in the data.frame, as.matrix coerces your data to character. vapply allows you to specify the format of the output. (I'll add an example to your post.)

BenBarnes Over a year ago

And the result is a matrix, not a data.frame!

Faheem Mitha Over a year ago

@BenBarnes: Thanks for the improved version. I had trouble with the FUN.VALUE argument. It seems from the (scant) documentation that this gives the type and length of the return value from FUN. However, the rules are not precisely stated. I tried using data.frame(rownum = 0, colnum = 0, numparam_modeltrue = 0, numparam_modelresult = 0) but got an error message. The idea was to get a data frame returned instead of a matrix.

Faheem Mitha Over a year ago

@BenBarnes: I suggest you split off your version into a separate answer. That way people can upvote it, and I might choose it as my preferred answer. Except for having to transpose and convert to data frame, it looks good.

Collectives™ on Stack Overflow

Calculating R function with arguments which are columns of a data frame

3 Answers 3

11 Comments

2 Comments

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

11 Comments

2 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related