0

within a data set (df) with > 600 observations and 100 variables, I have a variable with multiple characteristics in form of a character strings like in the following:

df$a
   a
1 aa
2 bb
3 aa
4 cc
5 bb
6 dd
7 cc
8 dd

Now, I would like to compute a new binary variable out of a , where all "aa" and "bb" get the value 0 and all "cc" and "dd" get the value 1. I expect something like that:

   a b 
1 aa 0
2 bb 0
3 aa 0
4 cc 1
5 bb 0
6 dd 1
7 cc 1
8 dd 1

How would I do that?

Thank you very much in advance for any kind of help.

Magnus

4
  • 1
    Maybe with(df, ifelse(a %in% c("aa", "bb"), 0, 1)) Commented Nov 24, 2014 at 19:44
  • You could come up with plenty ways to do this, but -perhaps- a "formal" way would be manipulating R's "factor" class? I.e., here use "levels<-" function; levels(DF$a) = list("0" = c("aa", "bb"), "1" = c("cc", "dd")) Commented Nov 24, 2014 at 20:09
  • @RichardScriven Although the MWE works fine, if I use the approach within my real data set with NewVariable <- with(df, ifelse(OldVariable %in% c("first value", "second value", "third value"), 0, 1)) the new variable holds only the value 1 for all observations, also for those which should be 0. I don't know why. Commented Nov 24, 2014 at 20:28
  • I just edited the question. Now, the question should be more precise about what I actually want to do. Commented Nov 24, 2014 at 20:34

3 Answers 3

3

General purpose solution: build a key (or "dictionary").

> key <- c("aa" = 0, "bb" = 0, "cc" = 1, "dd" = 1)
> key[a]
aa bb aa cc bb dd cc dd 
0  0  0  1  0  1  1  1 
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks. New Variable <- key[a].
One comment: names of vectors (which include lists) do NOT have to be unique in R. So if you're building your dictionary programmatically, be careful to check for duplicate keys. Also note that names of vectors are not hashed, so it's not an O(1) lookup, it's O(n), where n is the number of keys.
0

I would subset using a logical test and run something like:

a <- c("aa", "bb", "aa", "cc", "bb", "dd", "cc", "dd")
a[a == "aa"] <- 0
a[a == "bb"] <- 0
a[a == "cc"] <- 1
a[a == "dd"] <- 1
a <- data.frame(a)
a

Comments

0

There are many ways, one of them is to use recode from package car

dat1 <- data.frame(a=c("aa", "bb", "aa", "cc", "bb", "dd", "cc", "dd"))
dat2 <- transform(dat1, b=car::recode(a,"c('aa','bb')=0;c('cc','dd')=1;else=NA",as.factor.result=FALSE))

> dat2
   a b
1 aa 0
2 bb 0
3 aa 0
4 cc 1
5 bb 0
6 dd 1
7 cc 1
8 dd 1

1 Comment

This solution means I would generate a new data set, right? I would like to recode into a new variable within the existing data set.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.