2

I am new to Stackoverflow and quite new to R. I would really appreciate your help.

I am using dplyr's mutate() function to create a set new columns based on one initial column. For an a priori known number of columns to be created, everything works fine.

However, in my application, the number of new columns to be created is unknown (or rather determined as input parameter before running the code).

For illustration, consider the following minimal working example:

library(RSQLite)
library(dplyr)
library(dbplyr)
library(DBI)

con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")

copy_to(con, mtcars, "mtcars", temporary = FALSE)

db <- tbl(con, "mtcars") %>%
    select(carb) %>%
    distinct(carb) %>%
    arrange(carb) %>%
    mutate(carb1 = carb + 1) %>%
    mutate(carb2 = carb + 2) %>%
    mutate(carb3 = carb + 3) %>%
    show_query() %>%
    collect()

In this example, I create three new variables. However, I want the program to work with a dynamic number of variables (e.g., five or ten new variables). I also would like to do all of the calculations before collect(), because I want to copy the data into memory as late as possible.

Some background for my real life application: I want to use the DB2's function ADD_MONTHS(). So I need dplyr/dbplyr to flush that function directly into an SQL command. I therefore need a solution that actually does not use data frame logic - I need the solution to be in dplyr.

From a different perspective: In SAS I'd use the macro processor to dynamically build a proc sql statement. Is there an equivalent in R?

2 Answers 2

3

We can use map

library(dplyr)
library(purrr)
library(stringr)
map_dfc(1:3, ~ df %>%
                  transmute(!! str_c('x', .x) := x + .x)) %>%
    bind_cols(df, .)
#  x x1 x2 x3
#1 1  2  3  4
#2 2  3  4  5
#3 3  4  5  6

In the case of database, do the collect before adding the columns

dat <- tbl(con, "mtcars") %>%
        select(carb) %>%
        distinct(carb) %>%
        arrange(carb) %>%
        collect()
map_dfc(dat$carb, ~ dat %>%
                      transmute(!! str_c('carb', .x) := carb + .x)) %>%
    bind_cols(dat, .)
# A tibble: 6 x 7
#   carb carb1 carb2 carb3 carb4 carb6 carb8
#  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1     1     2     3     4     5     7     9
#2     2     3     4     5     6     8    10
#3     3     4     5     6     7     9    11
#4     4     5     6     7     8    10    12
#5     6     7     8     9    10    12    14
#6     8     9    10    11    12    14    16

Or another option if we want to do this before collecting is to pass an expression in mutate

tbl(con, "mtcars") %>%
   select(carb) %>%
   distinct(carb) %>%
   arrange(carb) %>%
   mutate(!!! rlang::parse_exprs(str_c('carb', 1:3, sep="+", collapse=";"))) %>%
   rename_at(-1, ~ str_c('carb', 1:3)) %>%
   show_query() %>%
   collect()
#<SQL>
#SELECT `carb`, `carb` + 1.0 AS `carb1`, `carb` + 2.0 AS `carb2`, `carb` + 3.0 AS #`carb3`
#FROM (SELECT *
#FROM (SELECT DISTINCT *
#FROM (SELECT `carb`
#FROM `mtcars`))
#ORDER BY `carb`)
# A tibble: 6 x 4
#   carb carb1 carb2 carb3
#  <dbl> <dbl> <dbl> <dbl>
#1     1     2     3     4
#2     2     3     4     5
#3     3     4     5     6
#4     4     5     6     7
#5     6     7     8     9
#6     8     9    10    11
Sign up to request clarification or add additional context in comments.

3 Comments

Hi akrun, thank you for your answer. Your example works if library(stringr) is added. Unfortunately, my example was too simplistic or rather poorly chosen. Instead of working with a data frame, I work with a database and cannot construct a data frame because I want to do calculations on a database. From my limited understanding of purrr and map_dfc, it only works with data frames. I constructed a new minimal working example that captures my problem better and edited my originally submitted question accordingly.
Is there a way to do it before collect()? In my real-life I cannot copy the data into memory because it is quite large and I want to keep it in the database as long as possible.
This is the solution I was looking for, thank you so much.
0

We can use map2_dfc from purrr pass the values to add and add data to original df.

library(dplyr)
library(purrr)

bind_cols(df, map2_dfc(1:3, df ,`+`))

#  x V1 V2 V3
#1 1  2  3  4
#2 2  3  4  5
#3 3  4  5  6

1 Comment

Hi Ronak, thank you for your answer. The code works fine. Unfortunately, my example was too simplistic or rather poorly chosen. Instead of working with a data frame, I work with a database and cannot construct a data frame because I want to do calculations on a database. From my limited understanding of purrr and map2_dfc, it only works with data frames. I constructed a new minimal working example that captures my problem better and edited my originally submitted question accordingly.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.