Looping Python Parameters Through SQL Code

Question

I need to create the following report scalable:

query = """
(SELECT
    '02/11/2019' as Week_of,
    media_type,
    campaign,
    count(ad_start_ts) as frequency
FROM usotomayor.digital 
WHERE ds between 20190211 and 20190217
GROUP BY 1,2,3)
UNION ALL
(SELECT
    '02/18/2019' as Week_of,
    media_type,
    campaign,
    count(ad_start_ts) as frequency
FROM usotomayor.digital 
WHERE ds between 20190211 and 20190224
GROUP BY 1,2,3)


"""

#Converting to dataframe
query2 = spark.sql(query).toPandas()
query2

However, as you can see I cannot make this report scalable if I have a long list of dates for each SQL query that I need to union.

My first attempt at looping in a list of date variables into the SQL script is as follows:

dfys = ['20190217','20190224']

df2 = ['02/11/2019','02/18/2019']

for i in df2:
    date=i

for j in dfys:
    date2=j

query = f"""
SELECT
    '{date}' as Week_of,
    raw.media_type,
    raw.campaign,
    count(raw.ad_start_ts) as frequency
FROM usotomayor.digital raw 
WHERE raw.ds between 20190211 and {date2}
GROUP BY 1,2,3

"""

#Converting to dataframe
query2 = spark.sql(query).toPandas()
query2

However, this is not working for me. I think I need to loop through the sql query itself, but I don't know how to do this. Can someone help me?

"this is not working for me" — Perhaps you could be more specific. — khelwood
– khelwood, Commented Jul 19, 2019 at 20:07
When I run the code with the Python loop it only runs the last set of the variable array list. — Ulises Sotomayor
– Ulises Sotomayor, Commented Jul 19, 2019 at 20:27
indentions are incorrect so I can only say you have to do it inside for-loop, not after for-loop. You can create list with all SELECT and later concatenate them with "\nUNION ALL\n".join(list_with_all_SELECT) — furas
– furas, Commented Jul 19, 2019 at 20:40

Marcus Grass · Accepted Answer · 2019-07-19 20:40:16Z

3

As a commenter said "this is not working for me" is not very specific so let's start at specifying the problem. You need to execute a query for each pair of dates you need to execute these queries as a loop and save the result (or actually union them, but then you need to change your query logic).

You could do it like this:

dfys = ['20190217', '20190224']

df2 = ['02/11/2019', '02/18/2019']

query_results = list()
for start_date, end_date in zip(dfys, df2):
    query = f"""
    SELECT
        '{start_date}' as Week_of,
        raw.media_type,
        raw.campaign,
        count(raw.ad_start_ts) as frequency
    FROM usotomayor.digital raw 
    WHERE raw.ds between 20190211 and {end_date}
    GROUP BY 1,2,3

    """
    query_results.append(spark.sql(query).toPandas())

query_results[0]
query_results[1]

Now you get a list of your results (query_results).

answered Jul 19, 2019 at 20:40

Marcus Grass

1,0832 gold badges18 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Catlover Over a year ago

Thank you SO much for this answer! After more than a dozen searches, I finally stumbled upon it and it was exactly what I needed! You’re so helpful and I appreciate your clear, correct, and applicable answer!

Collectives™ on Stack Overflow

Looping Python Parameters Through SQL Code

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related