3

I'm very new to anything SQL-related, but I feel like what I'm trying to do must be possible. I've looked at related questions, but they don't seem to be trying to do quite the same thing as I am.

I want to write a function to use in a dbplyr pipe, so I don't have to repeat the code if I need it again later.

I realise not all functions in R translate nicely to SQL, but in this case, case_when() does. It works fine when I use the bare code, but I don't know how to wrap it in a function. The function I write gets passed directly into SQL, and clearly it's not a function that exists in SQL.

Is it possible to get dbplyr to look at the code inside my function, and translate that into SQL?

library(pacman)
p_load(dplyr, dbplyr, RSQLite)

# Set up fake database
con <- DBI::dbConnect(RSQLite::SQLite(), dbname = ":memory:")
test = tibble(a = c("a", "b", "c"))
test
#> # A tibble: 3 x 1
#>   a    
#>   <chr>
#> 1 a    
#> 2 b    
#> 3 c
copy_to(con, test)
test_db <- tbl(con, "test")

Using case_when() explicitly on test_db works fine.

test_db %>% 
    mutate(b = case_when(
        a == "a" ~ "AAA",
        TRUE ~ a
    ))
#> # Source:   lazy query [?? x 2]
#> # Database: sqlite 3.30.1 [:memory:]
#>   a     b    
#>   <chr> <chr>
#> 1 a     AAA  
#> 2 b     b    
#> 3 c     c

Then I put my case_when() into a function, so I can reuse it. That works fine when running straight on test.

func_aaa <- function(x) {
    case_when(
        x == "a" ~ "AAA",
        TRUE ~ x
    )
}

# The function works fine on the regular dataframe
test %>% 
    mutate(b = func_aaa(a))
#> # A tibble: 3 x 2
#>   a     b    
#>   <chr> <chr>
#> 1 a     AAA  
#> 2 b     b    
#> 3 c     c

But it doesn't work on test_db. Looking at the query it generates, it's clear that dbplyr just passed my function name straight into SQL.

test_db %>% 
    mutate(b = func_aaa(a))
#> Error: no such function: func_aaa

test_db %>% 
    mutate(b = func_aaa(a)) %>% 
    show_query()
#> <SQL>
#> SELECT `a`, func_aaa(`a`) AS `b`
#> FROM `test`

Created on 2020-05-13 by the reprex package (v0.3.0)

1

1 Answer 1

2

There may be simpler approach, but one option is to use a function that operates on and returns the entire data frame, rather than running the function inside mutate:

func_aaa2 = function(data, var, newvar, val, newval) {
  data %>% 
    mutate({{newvar}} := case_when(
      {{var}} == val ~ newval,
      TRUE ~ {{var}}))
}

test_db %>% 
  func_aaa2(a, "b", "a", "aaa")
  a     b    
  <chr> <chr>
1 a     aaa  
2 b     b    
3 c     c
test_db %>% 
  func_aaa2(a, "b", "a", "aaa") %>% show_query()
SELECT `a`, CASE
WHEN (`a` = 'a') THEN ('aaa')
WHEN (1) THEN (`a`)
END AS `b`
FROM `test`
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.