1

I'm writing some queries with Google BigQuery. I want to extract the JSON from an array. Somehow I can extract it. And @Mikhail Berlyant helped me here. But now the problem is in extracting JSON from the array without Duplicates.

Current Structure:

enter image description here

I what I tried:

WITH
  cte AS (
  SELECT
    labels,
    cost
  FROM
    BILLING.gcp_billing_export_v1)
SELECT
  la,
  cost
FROM
  cte,
  UNNEST(labels) AS la

enter image description here

See the cost box, the COST value is repeated twice, because we have 2 KEY, VALUE pairs in the array.

So while doing sum(cost) with the group by la.key then I'm getting the wrong value.

What Im looking for is,

enter image description here

Can anyone help me with this?

4
  • So you just don't want the cost to be repeated? Commented Nov 19, 2018 at 18:08
  • Yes Khan, I don't cost to be repeated Commented Nov 19, 2018 at 18:11
  • Your keys of department and hrd are uniform across the dataset? Commented Nov 19, 2018 at 18:19
  • No, its has different value on some rows. Commented Nov 19, 2018 at 18:22

1 Answer 1

3

Below is for BigQuery Standard SQL

#standardSQL
SELECT 
  description, 
  ARRAY(
    SELECT AS STRUCT 
      JSON_EXTRACT_SCALAR(kv, '$.key') key, 
      JSON_EXTRACT_SCALAR(kv, '$.value') value 
    FROM UNNEST(SPLIT(labels, '},{')) kv_temp, 
    UNNEST([CONCAT('{', REGEXP_REPLACE(kv_temp, r'^\[{|}]$', ''), '}')]) kv
  ) labels,
  cost
FROM `project.dataset.table`   

You can test, play with above using excerpt of dummy data from your question as below

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'aaa' description, '[{"key":"application","value":"scaled-server"},{"key":"department","value":"hrd"}]' labels, 0.323316 cost UNION ALL
  SELECT 'bbb' description, '[{"key":"application2","value":"scaled-server2"},{"key":"department2","value":"hrd2"}]' labels, 0.342825 cost 
)
SELECT 
  description, 
  ARRAY(
    SELECT AS STRUCT 
      JSON_EXTRACT_SCALAR(kv, '$.key') key, 
      JSON_EXTRACT_SCALAR(kv, '$.value') value 
    FROM UNNEST(SPLIT(labels, '},{')) kv_temp, 
    UNNEST([CONCAT('{', REGEXP_REPLACE(kv_temp, r'^\[{|}]$', ''), '}')]) kv
  ) labels,
  cost
FROM `project.dataset.table`   

with result

Row description labels.key      labels.value    cost     
1   aaa         application     scaled-server   0.323316     
                department      hrd      
2   bbb         application2    scaled-server2  0.342825     
                department2     hrd2         
Sign up to request clarification or add additional context in comments.

7 Comments

Thanks for your amazing queries, it works, but while apply the same query on my actual dataset im getting this error No matching signature for function SPLIT for argument types: ARRAY<STRUCT<key STRING, value STRING>>, STRING. Supported signatures: SPLIT(STRING, [STRING]); SPLIT(BYTES, BYTES) at [20:17] . Line 20 is ` FROM UNNEST(SPLIT(labels, '},{')) kv_temp, `
in your question you provided example of data - see Current Structure section. so my answer is based on what you provided in the question
I used TO_JSON_STRING(labels) to get that column. Is that fine?
I changed it to FROM UNNEST(SPLIT(TO_JSON_STRING(labels), '},{')) kv_temp . now its working
Hey @Mikhail, If I want to do group by 1,2 3 for sum (c0st) how this works?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.