0

I am not sure if this is possible.

Right now I am running this using the sqldf package:

Col1 <- c('emdabcer','deffghiee','lmnop')
Col2 <- c(1,2,3)
df <- data.frame(Col1, Col2)

df
      Col1 Col2
  emdabcer    1
 deffghiee    2
     lmnop    3

Right now, I am typing in the SQL scripts manually.

sqldf("SELECT *, CASE 
WHEN [Col1] LIKE '%abc%' THEN REPLACE([Col1], [Col1], 'Label1')
WHEN [Col1] LIKE '%def%' AND [Col1] LIKE '%ghi%' THEN REPLACE([Col1], [Col1], 'Label2')
ELSE NULL END [Category Label] FROM df")

I have 40 different CASE WHEN instances in my actual dataset.

Is there a way I can use a separate table/dataframe that has a column of my SQL queries and run each row to get my output?

Below is an example dataframe with my queries:

Queries <- c("WHEN [Col1] LIKE '%abc%' THEN REPLACE([Col1], [Col1], 'Label1')",
         "WHEN [Col1] LIKE '%def%' AND [Col1] LIKE '%ghi%' THEN REPLACE([Col1], [Col1], 'Label2')",
         "WHEN [Col1] LIKE '%mn%' THEN REPLACE([Col1], [Col1], 'Label3')")
Query_df <- data.frame(Queries)

Query_df

Queries
WHEN [Col1] LIKE '%abc%' THEN REPLACE([Col1], [Col1], 'Label1')
WHEN [Col1] LIKE '%def%' AND [Col1] LIKE '%ghi%' THEN REPLACE([Col1], [Col1], 'Label2')
WHEN [Col1] LIKE '%mn%' THEN REPLACE([Col1], [Col1], 'Label3')

And then I would do something like this:

sqldf("SELECT *, CASE 
WHILE length(Queries_df) <= length(Queries_df)
BEGIN RUN Queries
END

I know the above is wrong but something along those lines.

Any help would be great thanks!

This is the reference I am looking into: https://www.essentialsql.com/using-while-statement-stored-procedures/

1 Answer 1

1

Create the Pat data frame which defines the patterns to look for and then join it to df:

Pat <- data.frame(
  pat1 = c('abc', 'def'),
  pat2 = c('', 'ghi'),
  Label = c('Label1', 'Label2'),
  stringsAsFactors = FALSE)

sqldf("select a.*, b.Label
  from df a 
  left join Pat b on a.Col1 like '%' || b.pat1 || '%' and 
                     a.Col1 like '%' || b.pat2 || '%'")

giving:

       Col1 Col2  Label
1  emdabcer    1 Label1
2 deffghiee    2 Label2
3     lmnop    3   <NA>
Sign up to request clarification or add additional context in comments.

7 Comments

Thanks @G. Grothendieck, I'm not sure if this is what I'm looking for unfortunately because it would still require me to write every query still in the main sqldf code. I am looking for 1 query that can call a table of queries no matter how many rows it has. So the main code would not change if there were 40 vs. 50 SQL scripts in a dataframe.
This is a single query. All you have to do is define Pat and then run the single query shown in the answer and it will add the Label column as shown. There is no need to run multiple queries, use numerous case when clauses nor use a while loop, etc. The whole thing is just a single join.
Apologies, I just glanced over the answer before responding. This makes sense and this works for my example. thank you. I have one other question regarding this: Would this join concept still work if I had a query that states "NOT Contains" as well?
Add a third column to Pat which defines a string that is not to be contained in the target and add a third expression in the on clause in the sql statement.
Yup I planned on doing that. What would the expression be for does not contain b.pat3? Would it be a.Col1 NOT like '%' || b.pat3 || '%'
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.