5

Coming from SQL Server and a little bit of MySQL, I'm not sure how to proceed on google's BigQuery web browser query tool.

There doesn't appear to be any way to create, use or Set/Declare variables. How are folks working around this? Or perhaps I have missed something obvious in the instructions or the nature of BigQuery? Java API?

4 Answers 4

5

It is now possible to declare and set variables using SQL. For more information, see the documentation, but here is an example:

-- Declare a variable to hold names as an array.
DECLARE top_names ARRAY<STRING>;
-- Build an array of the top 100 names from the year 2017.
SET top_names = (
  SELECT ARRAY_AGG(name ORDER BY number DESC LIMIT 100)
  FROM `bigquery-public-data`.usa_names.usa_1910_current
  WHERE year = 2017
);
-- Which names appear as words in Shakespeare's plays?
SELECT
  name AS shakespeare_name
FROM UNNEST(top_names) AS name
WHERE name IN (
  SELECT word
  FROM `bigquery-public-data`.samples.shakespeare
);
Sign up to request clarification or add additional context in comments.

2 Comments

This is the more elegant and more powerful approach. But there are times when you may want to avoid scripting (for example, when you want a simple SQL query to plug into the BQ query scheduling.) For those cases, consider the WITH-CROSS JOIN approach described in stackoverflow.com/a/60008489/64967
I use this, but for some reason when writing BigQuery in the BigQuery interface, after executing the query, we will not get the results right away; for some reason the results are hidden behind an extra button "Show results". A small, but still, nuisance.
1

There is currently no way to set/declare variables in BigQuery. If you need variables, you'll need to cut and paste them where you need them. Feel free to file this as a feature request here.

4 Comments

I understand that I can't define a value at the start of the query, but can you use a variable as part of a query? i.e. Select @Total_Item_Baskets = COUNT(UNQ_ID_NBR) from [MyData.table1] D Where D.SKU_ITEM_KEY = 455023 I know it doesn't like the @, but this doesn't appear to work with/without it either.
what about "Select COUNT(UNQ_ID_NBR) AS Total_Item_Baskets from [MyData.table1] D Where D.SKU_ITEM_KEY = 455023". Does that do what you want?
It's not a problem getting the query to run without variables. The issue is that this query is going to return a number, and I want to use that number in serveral places in the next part of the query
In that case, the only other way that I know of doing that without cut-and-paste is to use a cross join.
1

Its not elegant, and its a a pain, but...

The way we handle it is using a python script that replaces a "variable placeholder" in our query and than sending the amended query via the API.

I have opened a feature request asking for "Dynamic SQL" capabilities.

1 Comment

Thanks. It seems that RedShift doesn't support variables either, although I like the postgresql/workbench interface. There must be a reason that they are structured this way so I'm not holding my breath waiting for the feature to be added. Using an API has been considered, but wasn't something we wanted to explore right away. At least they mystery is solved - variables aren't working because they aren't supported.
1

If you want to avoid BQ scripting, you can sometimes use an idiom which utilizes WITH and CROSS JOIN.

In the example below:

  • the events table contains some timestamped events
  • the reports table contain occasional aggregate values of the events
  • the goal is to write a query that only generates incremental (non-duplicate) aggregate rows

This is achieved by

  • introducing a state temp table that looks at a target table for aggregate results
  • to determine parameters (params) for the actual query
  • the params are CROSS JOINed with the actual query
  • allowing the param row's columns to be used to constrain the query
  • this query will repeatably return the same results
  • until the results themselves are appended to the reports table
WTIH state AS (
  SELECT
    -- what was the newest report's ending time?
    COALESCE(
        SELECT MAX(report_end_ts) FROM `x.y.reports`, 
        TIMESTAMP("2019-01-01")
      ) AS latest_report_ts,
    ...
),
params AS (
  SELECT
    -- look for events since end of last report
    latest_report_ts AS event_after_ts,
    -- and go until now
    CURRENT_TIMESTAMP() AS event_before_ts
)

SELECT 
  MIN(event_ts) AS report_begin_ts,
  MAX(event_ts) AS report_end_ts
  COUNT(1) AS event_count, 
  SUM(errors) AS error_total
FROM `x.y.events`
CROSS JOIN params
WHERE event_ts > event_after_ts
  AND event_ts < event_before_ts
)

This approach is useful for bigquery scheduled queries.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.