2

ORIGINAL QUESTION

I want to add a series of dummy variables in a data frame for each value of x in that data frame but containing an NA if another variable is NA. For example, suppose I have the below data frame:

x <- seq(1:5)
y <- c(NA, 1, NA, 0, NA)
z <- data.frame(x, y)

I am looking to produce:

  • var1 such that: z$var1 == 1 if x == 1, else if y == NA, z$var1 == NA, else z$var1 == 0.
  • var2 such that: z$var2 == 1 if x == 2, else if y == NA, z$var2 == NA, else z$var2 == 0.
  • var3 etc.

I can't seem to figure out how to vectorize this. I am looking for a solution that can be used for a large count of values of x.

UPDATE

There was some confusion that I wanted to iterate through each index of x. I am not looking for this, but rather for a solution that creates a variable for each unique value of x. When taking the below data as an input:

x <- c(1,1,2,3,9)
y <- c(NA, 1, NA, 0, NA)
z <- data.frame(x, y)

I am looking for z$var1, z$var2, z$var3, z$var9 where z$var1 <- c(1, 1, NA, 0, NA) and z$var2 <- c(NA, 0, 1, 0, NA). The original solution produces z$var1 <- z$var2 <- c(1,1,NA,0,NA).

1 Answer 1

2

You can use the ifelse which is vectorized to construct the variables:

cbind(z, setNames(data.frame(sapply(unique(x), function(i) ifelse(x == i, 1, ifelse(is.na(y), NA, 0)))), 
                  paste("var", unique(x), sep = "")))

  x  y var1 var2 var3 var9
1 1 NA    1   NA   NA   NA
2 1  1    1    0    0    0
3 2 NA   NA    1   NA   NA
4 3  0    0    0    1    0
5 9 NA   NA   NA   NA    1

Update:

cbind(z, data.frame(sapply(unique(x), function(i) ifelse(x == i, 1, ifelse(is.na(y), NA, 0)))))
  x  y X1 X2 X3 X4
1 1 NA  1 NA NA NA
2 1  1  1  0  0  0
3 2 NA NA  1 NA NA
4 3  0  0  0  1  0
5 9 NA NA NA NA  1
Sign up to request clarification or add additional context in comments.

4 Comments

Not quite. The values of each of var1, var2, var3, etc. should only be {0, 1, NA]. I am guessing this should be ifelse(x == i, 1,... Also, can you explain how setNames is working here? Not sure why I need it.
My bad of confusing the condition and results, see the update. setNames is just a convenient way to set up the columns names. If you don't care about it, we can remove it.
Updated: see above.
In your original data, x doesn't have repetitive values. If you do have and don't want to create repetitive variables, put a unique function around x.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.