Mutate multiple variable to create multiple new variables

Question

Let's say I have a tibble where I need to take multiple variables and mutate them into new multiple new variables.

As an example, here is a simple tibble:

tb <- tribble(
  ~x, ~y1, ~y2, ~y3, ~z,
  1,2,4,6,2,
  2,1,2,3,3,
  3,6,4,2,1
)

I want to subtract variable z from every variable with a name starting with "y", and mutate the results as new variables of tb. Also, suppose I don't know how many "y" variables I have. I want the solution to fit nicely within tidyverse / dplyr workflow.

In essence, I don't understand how to mutate multiple variables into multiple new variables. I'm not sure if you can use mutate in this instance? I've tried mutate_if, but I don't think I'm using it right (and I get an error):

tb %>% mutate_if(starts_with("y"), funs(.-z))

#Error: No tidyselect variables were registered

Thanks in advance!

iroha · Accepted Answer · 2024-12-15 01:47:45Z

Update: dplyr 1.0.0+ has across() function which simplifies this task even further

Basic usage

across() has two primary arguments:

The first argument, .cols, selects the columns you want to operate on. It uses tidy selection (like select()) so you can pick variables by position, name, and type.

The second argument, .fns, is a function or list of functions to apply to each column. This can also be a purrr style formula (or list of formulas) like ~ .x / 2. (This argument is optional, and you can omit it if you just want to get the underlying data; you'll see that technique used in vignette("rowwise").)

# Control how the names are created with the `.names` argument which 
# takes a [glue](http://glue.tidyverse.org/) spec:
tb %>% 
  mutate(
    across(starts_with("y"), ~ .x - z, .names = "mod_{col}")
  )
#> # A tibble: 3 x 8
#>       x    y1    y2    y3     z mod_y1 mod_y2 mod_y3
#>   <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>
#> 1     1     2     4     6     2      0      2      4
#> 2     2     1     2     3     3     -2     -1      0
#> 3     3     6     4     2     1      5      3      1

tb %>% 
  mutate(
    across(num_range(prefix = "y", range = 1:3), ~ .x - z, .names = "mod_{col}")
  )
#> # A tibble: 3 x 8
#>       x    y1    y2    y3     z mod_y1 mod_y2 mod_y3
#>   <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>
#> 1     1     2     4     6     2      0      2      4
#> 2     2     1     2     3     3     -2     -1      0
#> 3     3     6     4     2     1      5      3      1

### Multiple functions
tb %>% 
  mutate(
    across(c(matches("x"), contains("z")), ~ max(.x, na.rm = TRUE), .names = "max_{col}"),
    across(c(y1:y3), ~ .x - z, .names = "mod_{col}")
  )
#> # A tibble: 3 x 10
#>       x    y1    y2    y3     z max_x max_z mod_y1 mod_y2 mod_y3
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>
#> 1     1     2     4     6     2     3     3      0      2      4
#> 2     2     1     2     3     3     3     3     -2     -1      0
#> 3     3     6     4     2     1     3     3      5      3      1

Superseded approach:

Because you are operating on column names, you need to use mutate_at rather than mutate_if which uses the values within columns

tb %>% mutate_at(vars(starts_with("y")), funs(. - z))
#> # A tibble: 3 x 5
#>       x    y1    y2    y3     z
#>   <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1     0     2     4     2
#> 2     2    -2    -1     0     3
#> 3     3     5     3     1     1

To create new columns, instead of overwriting existing ones, we can give name to funs

# add suffix
tb %>% mutate_at(vars(starts_with("y")), funs(mod = . - z))
#> # A tibble: 3 x 8
#>       x    y1    y2    y3     z y1_mod y2_mod y3_mod
#>   <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>
#> 1     1     2     4     6     2      0      2      4
#> 2     2     1     2     3     3     -2     -1      0
#> 3     3     6     4     2     1      5      3      1

# remove suffix, add prefix
tb %>%
  mutate_at(vars(starts_with("y")),  funs(mod = . - z)) %>%
  rename_at(vars(ends_with("_mod")), funs(paste("mod", gsub("_mod", "", .), sep = "_")))
#> # A tibble: 3 x 8
#>       x    y1    y2    y3     z mod_y1 mod_y2 mod_y3
#>   <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>
#> 1     1     2     4     6     2      0      2      4
#> 2     2     1     2     3     3     -2     -1      0
#> 3     3     6     4     2     1      5      3      1

Edit: In dplyr 0.8.0 or higher versions, funs() will be deprecated (source1 & source2), need to use list() instead

tb %>% mutate_at(vars(starts_with("y")), list(~ . - z))
#> # A tibble: 3 x 5
#>       x    y1    y2    y3     z
#>   <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1     0     2     4     2
#> 2     2    -2    -1     0     3
#> 3     3     5     3     1     1

tb %>% mutate_at(vars(starts_with("y")), list(mod = ~ . - z))
#> # A tibble: 3 x 8
#>       x    y1    y2    y3     z y1_mod y2_mod y3_mod
#>   <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>
#> 1     1     2     4     6     2      0      2      4
#> 2     2     1     2     3     3     -2     -1      0
#> 3     3     6     4     2     1      5      3      1

tb %>%
  mutate_at(vars(starts_with("y")),  list(mod = ~ . - z)) %>%
  rename_at(vars(ends_with("_mod")), list(~ paste("mod", gsub("_mod", "", .), sep = "_")))
#> # A tibble: 3 x 8
#>       x    y1    y2    y3     z mod_y1 mod_y2 mod_y3
#>   <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>
#> 1     1     2     4     6     2      0      2      4
#> 2     2     1     2     3     3     -2     -1      0
#> 3     3     6     4     2     1      5      3      1

^{Created on 2018-10-29 by the reprex package (v0.2.1)}

Thanks! So if I wanted to have them be NEW variables, rather than changing the existing variables? So an 8 column tibble instead of 5? That's what I'm after.

Collectives™ on Stack Overflow

Mutate multiple variable to create multiple new variables

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related