1

I have a data frame with a few columns containing values, and a column containing the name of the relevant column. e.g.

df <- data.frame(p1=c("A", "B", "A"),
                 p2=c("C", "C", "D"),
                 name=c("p2", "p1", "p1"), stringsAsFactors=FALSE)

What I want to do is to retrieve a value from the column specified by the name field, i.e. the output as below.

> df
  p1 p2 name value
1  A  C   p2     C
2  B  C   p1     B
3  A  D   p1     A

I currently get by with df$value <- ifelse(df$name=="p1", df$p1, ifelse(df$name=="p2", df$p2, NA)) , which is inelegant and unscalable if there are more than just p1 and p2.

Any suggestion on a better approach for this?

5
  • I didn't understand that "using a loop" part; I thought apply and family doesn't use loops, unlike for etc? Commented Jan 19, 2015 at 3:58
  • Also, I have to admit I may not understand your code clearly enough to be comfortable reusing them. the diag one works perfectly but I don't understand what it did. And for my education: why cbind(seq_len(nrow(df)) instead of simply 1:nrow(df)? Commented Jan 19, 2015 at 4:00
  • 1
    You could use either one, but I used seq_len for cases with 0 rows of df. Anyway, it is okay for me. I was a bit curious.. Regarding the diag, it would be very inefficient because you are creating a huge dataset and then taking the diagonal. Commented Jan 19, 2015 at 4:01
  • Thanks for the benchmark. I'd still be interested to know what happened with the diag, because that is the shortest answer after all. Commented Jan 19, 2015 at 4:03
  • 1
    The idea is simple, df[,df$$name] creates a dataset with columns based on df$name. So obviously, many columns are repeated. When you do diag, it extracts the diagonal elements Commented Jan 19, 2015 at 4:06

2 Answers 2

2

You could try

df$value <- df[cbind(seq_len(nrow(df)), match(df$name, names(df)))]

The above is a vectorized solution. Or if you need only a compact solution (based on the number of characters)

diag(as.matrix(df[,df$name]))
#[1] "C" "B" "A"

Benchmarks

df1 <- df[rep(1:nrow(df),1e5),]

akrun <- function() {df1[cbind(seq_len(nrow(df1)),
                     match(df1$name, names(df1)))]}
colonel <- function() {apply(df1, 1 ,function(u) u[u['name']])}

library(microbenchmark)
microbenchmark(akrun(), colonel(), times=20L, unit='relative')
#Unit: relative
#  expr      min       lq     mean   median       uq      max neval cld
#  akrun()   1.0000   1.0000  1.00000  1.00000  1.00000  1.00000    20  a 
#colonel() 118.2858 102.3968 46.25946 77.92023 59.15559 23.56562    20   b
Sign up to request clarification or add additional context in comments.

4 Comments

this is working fine with my data class is data frame. But when my data is in data table it is throwing errors.
@PPC You can convert it to data.frame i.e. setDF(yourdata) and check if that works
Currently I converted DT to DF, and then this is working fine.But this is only workaround solution. I am looking for something that can be done using DT only.
@PPC You can do setDT(df)[, value := get(name), 1:nrow(df)];df$value #[1] "C" "B" "A"
1

Or very simply (but using a loop):

df$value = apply(df, 1 ,function(u) u[u['name']])

#> df
#  p1 p2 name value
#1  A  C   p2     C
#2  B  C   p1     B
#3  A  D   p1     A

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.