Creating multiple new columns based on existing columns (dplyr)

Question

I'm trying to automate creating variables indicating whether students' answer (variables beginning with l,m, f or g) to the questions (eg. variables starting in "test_") are correct or not. ie. This done by checking whether, for example, test_l1 == l1.

I cannot figure out how to do this other than using the index, but it's very tedious and creates a lot of codes.

Below is a toy dataset that mimics the structure of the actual dataset which has 4 different kinds of tests with 12 exercises each (test_l1 ~ test_l12, test_m1 ~ test_m12, test_f1~,test_g1~) and corresponding student responses (l1~l12, m1~m12, f1~, g1~). I would like to create 48 variables that are namely correct_l1 ~ correct_l12, correct_m1~, correct_f1~ etc.)

df <- data.frame(test_l1 = c(1,0,0), 
                 test_l2=c(1,1,1), 
                 test_m1 = c(0,1,0), 
                 test_m2=c(0,1,1), 
                 l1=c(0,1,0), 
                 l2=c(1,1,1), 
                 m1=c(1,1,1), 
                 m2=c(0,0,1))

Many thanks in advance!!!

Anoushiravan R · Accepted Answer · 2021-07-12 13:03:35Z

3

Here is a tidyverse solution you can use:

library(dplyr)

df %>%
  mutate(across(starts_with("test_"), ~ .x == get(sub("test_", "", cur_column())), 
                .names = '{gsub("test_", "answer_", .col)}'))

  test_l1 test_l2 test_m1 test_m2 l1 l2 m1 m2 answer_l1 answer_l2 answer_m1 answer_m2
1       1       1       0       0  0  1  1  0     FALSE      TRUE     FALSE      TRUE
2       0       1       1       1  1  1  1  0     FALSE      TRUE      TRUE     FALSE
3       0       1       0       1  0  1  1  1      TRUE      TRUE     FALSE      TRUE

edited Jul 12, 2021 at 13:03

answered Jul 12, 2021 at 12:57

Anoushiravan R

22k3 gold badges22 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

ravman Over a year ago

Thank you so much, that worked!! (Upvoted but I'm a stack novice and don't have the reputation pts to upvote yet :( )

akrun Over a year ago

Nice use of cur_column()

Anoushiravan R Over a year ago

My pleasure dear Arun, this question was an opportunity I tweaked .names argument with a string manipulation function and again thanks to one of Anil's questions.

Ronak Shah · Accepted Answer · 2021-07-12 12:43:41Z

1

Get all the 'test' columns in test_cols, remove the string 'test_' from test_cols to get the corresponding columns to compare.

Directly compare the two dataframes and create new columns.

test_cols <- grep('test', names(df), value = TRUE)
ans_cols <- sub('test_', '', test_cols)
df[paste0('correct_', ans_cols)] <- df[test_cols] == df[ans_cols]

df
#  test_l1 test_l2 test_m1 test_m2 l1 l2 m1 m2 correct_l1 correct_l2 correct_m1 correct_m2
#1       1       1       0       0  0  1  1  0      FALSE       TRUE      FALSE       TRUE
#2       0       1       1       1  1  1  1  0      FALSE       TRUE       TRUE      FALSE
#3       0       1       0       1  0  1  1  1       TRUE       TRUE      FALSE       TRUE

where TRUE means the answer is correct and FALSE means answer is wrong.

answered Jul 12, 2021 at 12:43

Ronak Shah

391k20 gold badges173 silver badges237 bronze badges

1 Comment

ravman Over a year ago

Thank you so much, that worked!! (Upvoted but I'm a stack novice and don't have the reputation pts to upvote yet :( )

Collectives™ on Stack Overflow

Creating multiple new columns based on existing columns (dplyr)

2 Answers 2

3 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related