5

I'm using dplyr's automatic SQL backend to query subtable from a database table. E.g.

my_tbl <- tbl(my_db, "my_table")

where my_table in the database looks like

batch_name    value
batch_A_1     1
batch_A_2     2
batch_A_2     3
batch_B_1     8
batch_B_2     9
...

I just want the data from batch_A_#, regardless of the number.

If I were writing this in SQL, I could use

select * where batch_name like 'batch_A_%'

If I were writing this in R, I could use a few ways to get this: grepl(), %in%, or str_detect()

# option 1
subtable <- my_tbl %>% select(batch_name, value) %>%
    filter(grepl('batch_A_', batch_name, fixed = T))
# option 2
subtable <- my_tbl %>% select(batch_name, value) %>%
    filter(str_detect(batch_name, 'batch_A_'))

All of these give the following Postgres error: HINT: No function matches the given name and argument types. You might need to add explicit type casts

So, how do I pass in SQL string functions or matching functions to help make the generated dplyr SQL query able to use a more flexible range of functions in filter?

(FYI the %in% function does work, but requires listing out all possible values. This would be okay combined with paste to make a list, but does not work in a more general regex case)

2
  • 1
    Doesn't tbl(my_db, sql(select * where batch_name %like% 'batch_A_%')) work Commented Aug 15, 2016 at 20:53
  • @shayaa Sorry for the delay, of course that's right! For more complicated queries, I'm not sure this would always work, but I can't think of a counterexample right now. Thanks! Commented Aug 29, 2016 at 16:08

2 Answers 2

13


A "dplyr-only" solution would be this

tbl(my_con, "my_table") %>% 
  filter(batch_name %like% "batch_A_%") %>% 
  collect()

Full reprex:

suppressPackageStartupMessages({
  library(dplyr)
  library(dbplyr)
  library(RPostgreSQL)
})

my_con <- 
  dbConnect(
    PostgreSQL(),
    user     = "my_user",
    password = "my_password",
    host     = "my_host",
    dbname   = "my_db"
  )

my_table <- tribble(
  ~batch_name,    ~value,
  "batch_A_1",     1,
  "batch_A_2",     2,
  "batch_A_2",     3,
  "batch_B_1",     8,
  "batch_B_2",     9
)

copy_to(my_con, my_table)

tbl(my_con, "my_table") %>% 
  filter(batch_name %like% "batch_A_%") %>% 
  collect()
#> # A tibble: 3 x 2
#>   batch_name value
#> *      <chr> <dbl>
#> 1  batch_A_1     1
#> 2  batch_A_2     2
#> 3  batch_A_2     3

dbDisconnect(my_con)
#> [1] TRUE

This works because any functions that dplyr doesn't know how to translate will be passed along as is, see ?dbplyr::translate\_sql.

Hat-tip to @PaulRougieux for his recent comment here

Sign up to request clarification or add additional context in comments.

Comments

0

Using dplyr

Get the table batch_name from the database as dataframe and use it for further data analysis.

library("dplyr")
my_db <- src_postgres(dbname = "database-name", 
                      host = "localhost", 
                      port = 5432, 
                      user = "username",
                      password = "password")

df <- tbl(my_db, "my_table")
df %>% filter(batch_name == "batch_A_1")

Using DBI and RPostgreSQL

Get the table by sending sql query

library("DBI")
library("RPostgreSQL")  
m <- dbDriver("PostgreSQL")    
con <- dbConnect(drv = m,
                 dbname = "database-name", 
                 host = "localhost", 
                 port = 5432,
                 user = "username",
                 password = "password")    
df <- dbGetQuery(con, "SELECT * FROM my_table WHERE batch_name %LIKE% 'batch_A_%'")

library("dplyr")
df %>% filter(batch_name == "batch_A_1")

4 Comments

Can you edit this to reflect the fact that I wanted to use SQL's regular expression function like? Your first code section is incorrect because it uses an explicit batch_name. The second code section is good because it uses an explicit where statement in the query (which I forgot about), however it still uses an explicit batch_name.
it was missing FROM clause for table_name. You can replace it with the name of one of the tables in your database
The first code gets the entire table 'my_table'. Then dplyr library command is used on the results from the database query
The second code queries the table with column 'batch_name' matching the pattern of 'batch_A_'

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.