Binary variable from multiple character string variable

Question

within a data set (df) with > 600 observations and 100 variables, I have a variable with multiple characteristics in form of a character strings like in the following:

df$a
   a
1 aa
2 bb
3 aa
4 cc
5 bb
6 dd
7 cc
8 dd

Now, I would like to compute a new binary variable out of a , where all "aa" and "bb" get the value 0 and all "cc" and "dd" get the value 1. I expect something like that:

   a b 
1 aa 0
2 bb 0
3 aa 0
4 cc 1
5 bb 0
6 dd 1
7 cc 1
8 dd 1

How would I do that?

Thank you very much in advance for any kind of help.

Magnus

You could come up with plenty ways to do this, but -perhaps- a "formal" way would be manipulating R's "factor" class? I.e., here use "levels<-" function; levels(DF$a) = list("0" = c("aa", "bb"), "1" = c("cc", "dd")) — alexis_laz
– alexis_laz, Commented Nov 24, 2014 at 20:09
@RichardScriven Although the MWE works fine, if I use the approach within my real data set with NewVariable <- with(df, ifelse(OldVariable %in% c("first value", "second value", "third value"), 0, 1)) the new variable holds only the value 1 for all observations, also for those which should be 0. I don't know why. — Magnus Metz
– Magnus Metz, Commented Nov 24, 2014 at 20:28
I just edited the question. Now, the question should be more precise about what I actually want to do. — Magnus Metz
– Magnus Metz, Commented Nov 24, 2014 at 20:34

mmuurr · Accepted Answer · 2014-11-24 20:25:55Z

3

General purpose solution: build a key (or "dictionary").

> key <- c("aa" = 0, "bb" = 0, "cc" = 1, "dd" = 1)
> key[a]
aa bb aa cc bb dd cc dd 
0  0  0  1  0  1  1  1

answered Nov 24, 2014 at 20:25

mmuurr

1,5801 gold badge13 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Magnus Metz Over a year ago

Thanks. New Variable <- key[a].

mmuurr Over a year ago

One comment: names of vectors (which include lists) do NOT have to be unique in R. So if you're building your dictionary programmatically, be careful to check for duplicate keys. Also note that names of vectors are not hashed, so it's not an O(1) lookup, it's O(n), where n is the number of keys.

Phil · Accepted Answer · 2014-11-24 19:59:43Z

0

I would subset using a logical test and run something like:

a <- c("aa", "bb", "aa", "cc", "bb", "dd", "cc", "dd")
a[a == "aa"] <- 0
a[a == "bb"] <- 0
a[a == "cc"] <- 1
a[a == "dd"] <- 1
a <- data.frame(a)
a

answered Nov 24, 2014 at 19:59

Phil

4,4642 gold badges26 silver badges34 bronze badges

Comments

Wojciech Sobala · Accepted Answer · 2014-11-24 20:20:14Z

0

There are many ways, one of them is to use recode from package car

dat1 <- data.frame(a=c("aa", "bb", "aa", "cc", "bb", "dd", "cc", "dd"))
dat2 <- transform(dat1, b=car::recode(a,"c('aa','bb')=0;c('cc','dd')=1;else=NA",as.factor.result=FALSE))

> dat2
   a b
1 aa 0
2 bb 0
3 aa 0
4 cc 1
5 bb 0
6 dd 1
7 cc 1
8 dd 1

answered Nov 24, 2014 at 20:20

Wojciech Sobala

7,5812 gold badges23 silver badges27 bronze badges

1 Comment

Magnus Metz Over a year ago

This solution means I would generate a new data set, right? I would like to recode into a new variable within the existing data set.

Collectives™ on Stack Overflow

Binary variable from multiple character string variable

3 Answers 3

2 Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related