I have a large data frame in R with over 200 mostly character variables that I would like to add factors for. I have prepared all levels and labels in an separate data frame. For a certain variable Var1, the corresponding levels and labels are Var1_v and Var1_b, for example for the variable Gender the levels and labels are named Gender_v and Gender_l.
Here is an example of my data:
df <- data.frame (Gender = c("2","2","1","2"),
AgeG = c("3","1","4","2"))
fct <- data.frame (Gender_v = c("1", "2"),
Gender_b = c("Male", "Female"),
AgeG_v = c("1","2","3","4"),
AgeG_b = c("<25","25-60","65-80",">80"))
df$Gender <- factor(df$Gender, levels = fct$Gender_v, labels = fct$Gender_b, exclude = NULL)
df$AgeG <- factor(df$AgeG, levels = fct$AgeG_v, labels = fct$AgeG_b, exclude = NULL)
Is there away to automatize the process, so that the factors (levels and labels) are applied to corresponding variables without having me doing every single one individually?
I think it's done through a function probebly with pmap.
My goal is minimize the effort needed for this process. Is there a better way to prepare the labels and levels as well?
Help is much appreciated.
stringsAsFactorsin the creation of data frames. This may be useful earlier in your data pipeline. The error in your example code is due to your Gender_v and AgeG_v being stored as character values instead of numerical values. Your current code works whenGender_v = c(1,2)i.e. no quotation marks.stringsAsFactorsexactly help? I am not running any error in my code btw. It is just inefficient when you have to run it to over 200 variables.