4

I would like to filter a SQL database based whether a regular expression appears within any column. I would like to specify the regex as a variable; however it is read as a literal string. I am having trouble getting the regex in as a variable. Thank you for your help!

Resources I've consulted:

Note: I had trouble making a reprex using the mtcars dataset, following https://www.tidyverse.org/blog/2018/01/dbplyr-1-2/. I get the error: "Error: str_detect() is not available in this SQL variant". I cannot share a reprex using my actual SQL database. As such, below is a pseudo-reprex.

library(dplyr)
library(stringr)

# Variable with regex (either lower- or uppercase "m")
my_string <- "(?i)m"

# WITHOUT SQL DATABASE ----------------------------------------------------

# This runs
mtcars %>% 
  tibble::rownames_to_column() %>% 
  filter(str_length(rowname) > 5)

# This runs with STRING
mtcars %>% 
  tibble::rownames_to_column() %>% 
  filter(str_detect(rowname, "(?i)m"))

# This runs with VARIABLE
mtcars %>% 
  tibble::rownames_to_column() %>% 
  filter(str_detect(rowname, my_string))

# WITH SQL DATABASE -------------------------------------------------------

con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
mtcars_db <- copy_to(con, tibble::rownames_to_column(mtcars), "mtcars")

# This runs
tbl(con, "mtcars") %>% 
  filter(str_length(rowname) > 5)

# This *should* run with STRING -- pretend it does ;)
tbl(con, "mtcars") %>%
  filter(str_detect(rowname, "M"))

# This does NOT run with VARIABLE
tbl(con, "mtcars") %>%
  filter(str_detect(rowname, my_string))
2
  • 2
    I'm voting to reopen this; it was closed as a duplicate of a question about using regex in a variable, but that's not the problem here. The OP is able to create & use their regex pattern dynamically—the problem comes with using it with a sqlite database. If that method doesn't apply to sqlite or sql in general, that's not clear from the duplicate target Commented Feb 21, 2022 at 15:36
  • Thanks @camille. The issue is indeed with the interplay of stringr/dbplyr/sql and not using the regex in a variable - since that works just fine in a local database. Appreciate any ideas! I'm wondering whether I can force eval of the string with metaprogramming: adv-r.hadley.nz/metaprogramming.html Commented Feb 21, 2022 at 19:50

2 Answers 2

3

This might depend a lot on the flavour of SQL you are using. This issue mentions a translation for str_detect and also provides an alternative.

Testing for SQL Server:

library(dbplyr)
library(dplyr)
library(stringr)

data(mtcars)
df_sim = tbl_lazy(mtcars, con = simulate_mssql())
my_string <- "(?i)m"

df_sim %>%
  filter(str_detect(my_string, gear)) %>%
  show_query()
# Error: str_detect() is not available in this SQL variant

df_sim %>%
  filter(gear %like% my_string) %>%
  show_query()
# <SQL>
# SELECT *
# FROM `df`
# WHERE (`gear` like '(?i)m')

So it appears str_detect can not translate to SQL Server. But you can use %like% as a work around.

Testing for MySQL:

library(dbplyr)
library(dplyr)
library(stringr)

data(mtcars)
df_sim = tbl_lazy(mtcars, con = simulate_mysql()) # changed to mysql
my_string <- "(?i)m"

df_sim %>%
  filter(str_detect(my_string, gear)) %>%
  show_query()
# <SQL>
# SELECT *
# FROM `df`
# WHERE ('(?i)m' REGEXP `gear`)

df_sim %>%
  filter(gear %like% my_string) %>%
  show_query()
# <SQL>
# SELECT *
# FROM `df`
# WHERE (`gear` like '(?i)m')

So it appears str_detect can be translated correctly for MySQL.

In every case my_string is translated into the query.

A couple of other things to check:

  • Updating the version of dbplyr. Older versions do not have all the translations in newer versions.
  • Try a column other than rownames. Data frames in R can have rownames but tables in SQL can only have columns.
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks! I'm using a postgresql db and now have it working with filter(gear %ILIKE% my_string). However, I actually need to filter across multiple dynamically selected columns: filter(if_any(everything(), ~. %ILIKE% my_string)). This then reverts to using "my_string" literally. Any suggestions?
Take a look at this answer which uses a loop. And at this question where multiple cases are built as strings and then passed into a dplyr function.
3

With the help of a colleague, I have a solution to force evaluation of a variable with regex in string::str_detect:

tbl(con, "mtcars") %>%
  filter(str_detect(rowname, {{my_string}}))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.