5

I am trying to convert a SQL stored procedure to databricks notebook. In the stored procedure below 2 statements are to be implemented. Here the tables 1 and 2 are delta lake tables in databricks cluster.

I want to use a python variable in place of max_date in SQL query. How to do it?

%sql
DELETE FROM table1 WHERE Date = max_date;
INSERT INTO table1
SELECT * FROM table2 WHERE Date = max_date;

enter image description here

2

4 Answers 4

12

If you are going to run it cell by cell then you can use databricks widgets like

First cell

x=str(datetime.date.today())
dbutils.widgets.text("max_date",x)

Second cell

%sql
select getArgument("max_date") AS max_date

will give you

max_date
2022-06-04

but as mentioned here it does not work when run all is used and ideal way will be to create separate language based notebook and pass variables using %run

Other way is to use spark conf like below

First set a value for the conf [Note-the conf name should have .(dot) in it]

max_date2=str(datetime.date.today())
spark.conf.set("abc.max_dt2", max_date2)

Next try selecting value like below

%sql
select "${abc.max_dt2}" as max_date

It should give same value as above

Sign up to request clarification or add additional context in comments.

3 Comments

Both the methods worked at first but failed later. It started showing different dates. I think using magic functions in databricks instead of spark.sql is a mistake.
Failed later as in did they fail after cluster restart or notebook detach or just randomly started showing different values
Also like 2 other ways to access variable will be 1. the spark.sql way as you mentioned like spark.sql(f"select * from tdf where var={max_date2}") 2. will be to create a temp table with that value and use that table like spark.createDataFrame([(max_date2,)],"my_date string").createOrReplaceTempView("vartable") and use value from vartable in your query Also if you are thinking that changing variable value will update these then it will not happen as after variable changes value you will have to rerun the code to update the sql var
7

Assuming you either calculate max_date or use a widget to pass in the value like so:

max_date = dbutils.widgets.get("max_date")

You can use spark.sql() with your SQL query in a Python string like this:

df = spark.sql(f"""
DELETE FROM table1 WHERE Date = '{max_date}';
INSERT INTO table1
SELECT * FROM table2 WHERE Date = '{max_date}';
""")

display(df)

It is easier and very readable to use f-string formatting to craft the SQL string as desired, then pass it to the builtin spark.sql() executor. The spark.sql() function will return a DataFrame with the results of the SQL query if needed.

UPDATE: Addressing a good comment!

2 Comments

It is supposed to be: df = spark.sql(f""" DELETE FROM table1 WHERE Date = '{max_date}'; INSERT INTO table1 SELECT * FROM table2 WHERE Date = '{max_date}'; """)
This should be the best answer, because it is by far the most flexible. We can use python to do virtually anything with the SQL string.
2

If you are using PySpark in databricks, then another way to use python variable in a Spark SQL query is below:

max_date = '2022-03-31'
df = spark.sql(f"""SELECT * FROM table2 WHERE Date = '{max_date}' """)

Here 'f' at the beginning of the query refers to 'format' which will let you use the variable inside PySpark SQL statement

Comments

1

For a query use parameters:

base_query = """ SELECT col1,  sum(col3) AS `Total` FROM table WHERE col1 = '{specific_ValforCol1}' AND `date_col` BETWEEN '{datemin}' AND '{datemax}' GROUP BY col1"""

Add a function to build a custom query based on base query:

def build_query(variables: dict, sql_query: str = query) -> str:
     return sql_query.format(**variables)

Create a dictionary and call using f string:

vals = {"specific_ValforCol1": "7b5d", "datemin": "2022-04-19", "datemax": "2022-10-03"}

df = spark.sql(f"{(build_query(variables= vals , sql_query=base_query))}").toPandas()

This worked for me.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.