I was given a horrendous dataset which I am struggling to clean up: 272 (character) variables and 343 observations. It consists of a lot of binary variables that could have been summarized into one variable with multiple factors. So instead of asking "are you self employed or employed?" and given the options 1 "self employed", 2 "employed" and maybe a 3 "none/other", the set has two variables: v1.selfemployed and v2.employed with options 1 "yes" and 2 "no".
I now need to combine several binary variables into one. Since they are characters, I need to convert them into factors, which I did (see example).
### datasetdataset
v1 <- as.character(c("yes", "yes", "no", "yes", "yes", "no", "yes","no", "no", NA ))
v2 <- as.character(c("no","no","no","no","no","yes","no","yes", "no", NA))
v3 <- as.character(c("no","no", "yes", "no","no","no","no","no", "yes", NA))
df <- data.frame(v1,v2,v3)
library(tidyverse)
## dataframe -> tibble
df.t <- as_tibble(df)
## convert into 1/0 factor
df.t %>%
mutate_if(is.character, as.factor) %>%
mutate_at(vars(1:3), ~fct_recode(., "1" = "yes",
"0" = "no"))
I took this route because I have many binary "bundles" I need to be able to select via vars(). After converting all necessary bundles, I saved them in a new data.frame because I am unsure using tibbles. My Goal is to have a variable v.combined with the factor levels v1, v2 and v3.
This exact question has been posted 8 years ago in this thread. I tried the approaches they mentioned but they don't seem to work. They might be "outdated"? I end up with either more observations than before - which is interesting - or errors. In 8 years there must have happened something in developing R that might make the process easier.
Thank you everyone for your help!