I converted a factor into a numeric vector i.e. from "Male" and "Female" to "1" and "2" with the R code, "as.numeric(data$Gender)". However, I would like to convert the numeric vector into binary numbers i.e. from "1" representing "Male" and "2" representing "Female", to "1" representing "Male" and "0" representing "Female". Could anybody please kindly advice me on how this could be done? Thank you very much, and your help is very much appreciated.
1 Answer
Here are three alternatives:
Sample data:
set.seed(1)
x <- sample(c("Male", "Female"), 10, TRUE)
x
# [1] "Male" "Male" "Female" "Female" "Male" "Female" "Female" "Female" "Female" "Male"
Option 1: Use == (assumes that you only have these two options).
as.numeric(x == "Male")
# [1] 1 1 0 0 1 0 0 0 0 1
Option 2: Use a named key.
key <- setNames(0:1, c("Female", "Male"))
key[x]
# Male Male Female Female Male Female Female Female Female Male
# 1 1 0 0 1 0 0 0 0 1
Option 3: Use factor specifying the labels.
factor(x, c("Male", "Female"), labels = c(1, 0))
# [1] 1 1 0 0 1 0 0 0 0 1
# Levels: 1 0
Note that you'll still need as.numeric(as.character()) if you wanted a numeric vector:
as.numeric(as.character(factor(x, c("Male", "Female"), labels = c(1, 0))))
# [1] 1 1 0 0 1 0 0 0 0 1
2 Comments
lmo
Roughly the same as option one, but a little faster in some instances, I believe is
+(x == "Male").A5C1D2H2I1M1N2O1R2T1
@lmo, Thanks. I'm aware of that trick, however, in my opinion, the benefits of a very marginal increase in speed over using
as.numeric do not outweigh the benefits of more explicit code. Taking it one step further, you can use as.integer instead of as.numeric, which is closer in performance to + (surpassing it in most of my tests).
as.numeric(factor(c("male", "female"), levels=c("female", "male"))) - 1