BigQuery query creation without variables?

Question

Coming from SQL Server and a little bit of MySQL, I'm not sure how to proceed on google's BigQuery web browser query tool.

There doesn't appear to be any way to create, use or Set/Declare variables. How are folks working around this? Or perhaps I have missed something obvious in the instructions or the nature of BigQuery? Java API?

Elliott Brossard · Accepted Answer · 2019-10-03 19:39:52Z

5

It is now possible to declare and set variables using SQL. For more information, see the documentation, but here is an example:

-- Declare a variable to hold names as an array.
DECLARE top_names ARRAY<STRING>;
-- Build an array of the top 100 names from the year 2017.
SET top_names = (
  SELECT ARRAY_AGG(name ORDER BY number DESC LIMIT 100)
  FROM `bigquery-public-data`.usa_names.usa_1910_current
  WHERE year = 2017
);
-- Which names appear as words in Shakespeare's plays?
SELECT
  name AS shakespeare_name
FROM UNNEST(top_names) AS name
WHERE name IN (
  SELECT word
  FROM `bigquery-public-data`.samples.shakespeare
);

answered Oct 3, 2019 at 19:39

Elliott Brossard

34k2 gold badges75 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Dilum Ranatunga Over a year ago

This is the more elegant and more powerful approach. But there are times when you may want to avoid scripting (for example, when you want a simple SQL query to plug into the BQ query scheduling.) For those cases, consider the WITH-CROSS JOIN approach described in stackoverflow.com/a/60008489/64967

Jonny Aug 19 at 2:27

I use this, but for some reason when writing BigQuery in the BigQuery interface, after executing the query, we will not get the results right away; for some reason the results are hidden behind an extra button "Show results". A small, but still, nuisance.

Jordan Tigani · Accepted Answer · 2014-08-22 03:39:13Z

1

There is currently no way to set/declare variables in BigQuery. If you need variables, you'll need to cut and paste them where you need them. Feel free to file this as a feature request here.

answered Aug 22, 2014 at 3:39

Jordan Tigani

26.7k5 gold badges63 silver badges64 bronze badges

4 Comments

Steve A Over a year ago

I understand that I can't define a value at the start of the query, but can you use a variable as part of a query? i.e. Select @Total_Item_Baskets = COUNT(UNQ_ID_NBR) from [MyData.table1] D Where D.SKU_ITEM_KEY = 455023 I know it doesn't like the @, but this doesn't appear to work with/without it either.

Jordan Tigani Over a year ago

what about "Select COUNT(UNQ_ID_NBR) AS Total_Item_Baskets from [MyData.table1] D Where D.SKU_ITEM_KEY = 455023". Does that do what you want?

Steve A Over a year ago

It's not a problem getting the query to run without variables. The issue is that this query is going to return a number, and I want to use that number in serveral places in the next part of the query

Jordan Tigani Over a year ago

In that case, the only other way that I know of doing that without cut-and-paste is to use a cross join.

N.N. · Accepted Answer · 2014-08-24 08:02:26Z

1

Its not elegant, and its a a pain, but...

The way we handle it is using a python script that replaces a "variable placeholder" in our query and than sending the amended query via the API.

I have opened a feature request asking for "Dynamic SQL" capabilities.

answered Aug 24, 2014 at 8:02

N.N.

3,1821 gold badge24 silver badges44 bronze badges

1 Comment

Steve A Over a year ago

Thanks. It seems that RedShift doesn't support variables either, although I like the postgresql/workbench interface. There must be a reason that they are structured this way so I'm not holding my breath waiting for the feature to be added. Using an API has been considered, but wasn't something we wanted to explore right away. At least they mystery is solved - variables aren't working because they aren't supported.

Dilum Ranatunga · Accepted Answer · 2020-01-31 17:16:03Z

If you want to avoid BQ scripting, you can sometimes use an idiom which utilizes WITH and CROSS JOIN.

In the example below:

the events table contains some timestamped events
the reports table contain occasional aggregate values of the events
the goal is to write a query that only generates incremental (non-duplicate) aggregate rows

This is achieved by

introducing a state temp table that looks at a target table for aggregate results
to determine parameters (params) for the actual query
the params are CROSS JOINed with the actual query
allowing the param row's columns to be used to constrain the query
this query will repeatably return the same results
until the results themselves are appended to the reports table

WTIH state AS (
  SELECT
    -- what was the newest report's ending time?
    COALESCE(
        SELECT MAX(report_end_ts) FROM `x.y.reports`, 
        TIMESTAMP("2019-01-01")
      ) AS latest_report_ts,
    ...
),
params AS (
  SELECT
    -- look for events since end of last report
    latest_report_ts AS event_after_ts,
    -- and go until now
    CURRENT_TIMESTAMP() AS event_before_ts
)

SELECT 
  MIN(event_ts) AS report_begin_ts,
  MAX(event_ts) AS report_end_ts
  COUNT(1) AS event_count, 
  SUM(errors) AS error_total
FROM `x.y.events`
CROSS JOIN params
WHERE event_ts > event_after_ts
  AND event_ts < event_before_ts
)

This approach is useful for bigquery scheduled queries.

Collectives™ on Stack Overflow

BigQuery query creation without variables?

4 Answers 4

2 Comments

4 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

4 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related