If you want to avoid BQ scripting, you can sometimes use an idiom which utilizes WITH and CROSS JOIN.
In the example below:
- the
events table contains some timestamped events
- the
reports table contain occasional aggregate values of the events
- the goal is to write a query that only generates incremental (non-duplicate) aggregate rows
This is achieved by
- introducing a
state temp table that looks at a target table for aggregate results
- to determine parameters (
params) for the actual query
- the params are
CROSS JOINed with the actual query
- allowing the param row's columns to be used to constrain the query
- this query will repeatably return the same results
- until the results themselves are appended to the
reports table
WTIH state AS (
SELECT
-- what was the newest report's ending time?
COALESCE(
SELECT MAX(report_end_ts) FROM `x.y.reports`,
TIMESTAMP("2019-01-01")
) AS latest_report_ts,
...
),
params AS (
SELECT
-- look for events since end of last report
latest_report_ts AS event_after_ts,
-- and go until now
CURRENT_TIMESTAMP() AS event_before_ts
)
SELECT
MIN(event_ts) AS report_begin_ts,
MAX(event_ts) AS report_end_ts
COUNT(1) AS event_count,
SUM(errors) AS error_total
FROM `x.y.events`
CROSS JOIN params
WHERE event_ts > event_after_ts
AND event_ts < event_before_ts
)
This approach is useful for bigquery scheduled queries.