2

I have a dataframe that needs to be joined with the result set from a query. The query uses a column from a dataframe to filter the data in the database.

data_list = list(df['needed_column'])

I would like to use the variable in an sql query executed in a Jupyter sql cell.

%%sql
SELECT
    column_1,
    column_2,
    column_3
FROM my_database.my_table
WHERE
    column_1 IN data_list

Is there anyway that this can be done?

3 Answers 3

2

An workaround would be to execute the query inline as a variable.

data_list = str(list(df['needed_column']).replace('[', '(').replace(']', ')')

query_string = f"""
SELECT
    column_1,
    column_2,
    column_3
FROM my_database.my_table
WHERE
    column_1 IN {data_list}
"""

result_set = %sql $query_string
Sign up to request clarification or add additional context in comments.

1 Comment

nice workaround
1

for me it is working with single curly braces:

%sql SELECT {dynamic_column} FROM penguins.csv LIMIT {dynamic_limit}

The only thing to be aware if you're working with dynamic strings put them in single quotes in query. (WHERE name = '{custom_name}')

2 Comments

This works only for single line queries.
Thanks for the comment @drake10k. I suppose double curly braces are for multilines? Will try out soon.
0

You can use double brackets, just check Parameterizing SQL queries in the jupysql documentation.

It will look something like this:

sex = "MALE"
%%sql
SELECT *
FROM penguins.csv
WHERE sex = '{{sex}}'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.