create new variables based on other variables, with a loop over variables names in R

Question

I have a dataset with variables named as I10AA to I10ZZ and I11AA to I11ZZ. I want to create new variables IAA to IZZ, so that IAA = function(I10AA,I11AA).

As an highly simplified example.

set.seed(0)

df <- data.frame(I10AA=floor(runif(10,1,5)),I10AB=floor(runif(10,1,5)),
             I11AA=floor(runif(10,1,5)),I11AB=floor(runif(10,1,5)))

fun <- function(x,y) (x+y)

results <- df %>% mutate(IAA = fun(I10AA,I11AA),IAB = fun(I10AB,I11AB))

print(results)

results is the final dataset I want.

Is there a way to do this with tidyverse?

In the original dataset, the variables are arranged as:

 colnames(original_data) = "ID","I1AA", "I1AB", "I1AC", ... , "I1ZZ", "I2AA","I2AB",...,"I2ZZ",...,"I10AA",...,"I10ZZ","I11AA",..."I11ZZ"

Can you tell us how the columns are arranged in the orignial dataset? — akrun
– akrun, Commented Aug 5, 2018 at 20:06
There is no issue with the function, but I do not know how to loop over I10AA to I11ZZ — Leonhardt Guass
– Leonhardt Guass, Commented Aug 5, 2018 at 20:08

akrun · Accepted Answer · 2018-08-05 20:25:23Z

We can loop through the column names, use transmute to create new columns, rename the columns with the substring of the column names and bind with the original data

library(tidyverse)
i1 <- grepl("10", names(df))
nm1 <- sub("\\d+", "", names(df)[i1])
i2 <- !i1

map2(names(df)[i1], names(df)[i2], ~
        df %>% 
          transmute(fun(!! rlang::sym(.x), !!rlang::sym(.y)))) %>% 
          bind_cols %>% 
          rename_all(., ~ nm1) %>%
  bind_cols(df, .)
#    I10AA I10AB I11AA I11AB IAA IAB
#1      4     1     4     2   8   3
#2      2     1     4     2   6   3
#3      2     1     1     3   3   4
#4      3     3     3     2   6   5
#5      4     2     1     1   5   3
#6      1     4     2     4   3   8
#7      4     2     2     3   6   5
#8      4     3     1     4   5   7
#9      3     4     2     1   5   5
#10     3     2     4     3   7   5

Or another option is to create place the subset of datasets in a list and use reduce to pass the +

list(df %>% 
        select(names(.)[i1]),
     df %>%
        select(names(.)[i2])) %>% 
  reduce(`+`) %>% 
  rename_all(., ~ nm1) %>% 
  bind_cols(df, .)

An easier option would be

df[nm1] <- df[i1] + df[i2]

Collectives™ on Stack Overflow

create new variables based on other variables, with a loop over variables names in R

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related