2

I have a data frame with 40 variables G1_a, G1_b, ... till G20_a, G20_b (stemming from a survey). I want to create 20 new variables G1 ... G20 that summarize the existing variables.

data <- data.frame(G1_a = c(0, 0, 0, 1, NA), 
               G1_b = c(0, 0, 1, 1, NA), 
               G2_a = c(0, 0, 0, 1, NA), 
               G2_b = c(0, 0, 1, 1, NA))

# Reshaping without for-loop:
data <- data %>% 
  mutate(G1 = case_when(
    G1_a == 1 ~ "own_offer", 
    G1_b == 1 ~ "no_offer", 
    T ~ NA_character_
  ))

data <- data %>% 
  mutate(G2 = case_when(
    G2_a == 1 ~ "own_offer", 
    G2_b == 1 ~ "no_offer", 
    T ~ NA_character_
  ))

I want to automate the creation of the new variables in a for-loop, something like:

# Reshaping with for-loop:
for(i in 1:2) {
 data <- data %>% 
   mutate(assign(paste0("G", i), case_when(
     get(paste0("G", i, "_a")) == 1 ~ "own_offer", 
     get(paste0("G", i, "_b")) == 1 ~ "no_offer", 
     T ~ NA_character_
    )))
  }

My question includes two parts:

1) Is it possible to combine assign with mutate? I'm aware of approaches like mutate(df, !!varname := Petal.width * n) (see here) to dynamically assign parameter names. However, I was unable to combine it with the data reshaping I want to run.

2) Does dplyr allow the use of paste0 together with case_when and mutate?

4
  • 2
    is there a reason you can't return the results in a list rather than creating a bunch of new variables? It would be much easier, and more idiomatic ... Commented Oct 5, 2018 at 12:32
  • @BenBolker : the data frame contains many more variables. If I create a list, I have to eventually add it to the original data frame. I'd be happy for suggestions using such an approach. Commented Oct 5, 2018 at 13:38
  • what is your final target? Commented Oct 5, 2018 at 14:40
  • The final target is to get a clean dataset containing all survey responses in a format that allows easy and smooth analysis of the data. G1_a, G1_b etc. were provided as dummies, but should be encoded as categorical variables. Every G1_a and G1_b should become G1, G2_a and G2_b becomes G2 etc. Commented Oct 5, 2018 at 14:46

1 Answer 1

2

This is a little tricky, but I think it's the principled way to do it. The final result is a data frame with the desired columns, thus avoiding all of the get()/assign() headaches (and not cluttering up the workspace with lots of derived variables.) There are several steps where we change the shape of the data frame (wide -> long -> partially wide -> wide) using tidyr::gather() and tidyr::spread(). If it seems overwhelming, experiment with stopping the pipe sequence at various intermediate points to see what has been achieved so far.

library(tidyr)
library(dplyr)
dds <- (dd
  %>% mutate(case=seq(n()))    ## need a variable to distinguish rows in original data set
  %>% gather(var,val,-case)    ## -> long format: {case, var={G1_a,G1_b,...}, val={0,1,NA}}
  %>% separate(var,c("var","response"))  ## split to "G1","G2" + "a", "b"
  %>% spread(response,val)               ## convert back to semi-wide: {case, var, a, b}
  ## now collapse rows to categorical value, as above
  %>% mutate(offer=case_when(a==1 ~ "own_offer",
                             b==1 ~ "no_offer",
                             TRUE ~ NA_character_))
  %>% select(-c(a,b))          ## clean up now-redundant variables
  %>% spread(var,offer)        ## convert back to wide format: {case, G1, G2, ...}
  %>% select(-case)            ## now redundant
)

Result

         G1        G2
1      <NA>      <NA>
2      <NA>      <NA>
3  no_offer  no_offer
4 own_offer own_offer
5      <NA>      <NA>
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.