R: Creating dummy variables for values of one variable conditional on another variable

Question

ORIGINAL QUESTION

I want to add a series of dummy variables in a data frame for each value of x in that data frame but containing an NA if another variable is NA. For example, suppose I have the below data frame:

x <- seq(1:5)
y <- c(NA, 1, NA, 0, NA)
z <- data.frame(x, y)

I am looking to produce:

var1 such that: z$var1 == 1 if x == 1, else if y == NA, z$var1 == NA, else z$var1 == 0.
var2 such that: z$var2 == 1 if x == 2, else if y == NA, z$var2 == NA, else z$var2 == 0.
var3 etc.

I can't seem to figure out how to vectorize this. I am looking for a solution that can be used for a large count of values of x.

UPDATE

There was some confusion that I wanted to iterate through each index of x. I am not looking for this, but rather for a solution that creates a variable for each unique value of x. When taking the below data as an input:

x <- c(1,1,2,3,9)
y <- c(NA, 1, NA, 0, NA)
z <- data.frame(x, y)

I am looking for z$var1, z$var2, z$var3, z$var9 where z$var1 <- c(1, 1, NA, 0, NA) and z$var2 <- c(NA, 0, 1, 0, NA). The original solution produces z$var1 <- z$var2 <- c(1,1,NA,0,NA).

akuiper · Accepted Answer · 2016-06-21 00:07:54Z

2

You can use the ifelse which is vectorized to construct the variables:

cbind(z, setNames(data.frame(sapply(unique(x), function(i) ifelse(x == i, 1, ifelse(is.na(y), NA, 0)))), 
                  paste("var", unique(x), sep = "")))

  x  y var1 var2 var3 var9
1 1 NA    1   NA   NA   NA
2 1  1    1    0    0    0
3 2 NA   NA    1   NA   NA
4 3  0    0    0    1    0
5 9 NA   NA   NA   NA    1

Update:

cbind(z, data.frame(sapply(unique(x), function(i) ifelse(x == i, 1, ifelse(is.na(y), NA, 0)))))
  x  y X1 X2 X3 X4
1 1 NA  1 NA NA NA
2 1  1  1  0  0  0
3 2 NA NA  1 NA NA
4 3  0  0  0  1  0
5 9 NA NA NA NA  1

edited Jun 21, 2016 at 0:07

answered Jun 20, 2016 at 15:09

akuiper

216k33 gold badges362 silver badges379 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

socialscientist Over a year ago

Not quite. The values of each of var1, var2, var3, etc. should only be {0, 1, NA]. I am guessing this should be ifelse(x == i, 1,... Also, can you explain how setNames is working here? Not sure why I need it.

akuiper Over a year ago

My bad of confusing the condition and results, see the update. setNames is just a convenient way to set up the columns names. If you don't care about it, we can remove it.

socialscientist Over a year ago

Updated: see above.

akuiper Over a year ago

In your original data, x doesn't have repetitive values. If you do have and don't want to create repetitive variables, put a unique function around x.

Collectives™ on Stack Overflow

R: Creating dummy variables for values of one variable conditional on another variable

ORIGINAL QUESTION

UPDATE

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

ORIGINAL QUESTION

UPDATE

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related