3

I have a table with a column which contains a json-object, the value type is always a string.

I need 2 kind of information:

  • a list of the json keys
  • convert the json in an array of key-value pairs

This is what I got so far, which is working:

CREATE TEMP FUNCTION jsonObjectKeys(input STRING)
RETURNS Array<String>
LANGUAGE js AS """
  return Object.keys(JSON.parse(input));
""";

CREATE TEMP FUNCTION jsonToKeyValueArray(input STRING)
RETURNS Array<Struct<key String, value String>>
LANGUAGE js AS """
  let json = JSON.parse(input);
  return Object.keys(json).map(e => {
    return { "key" : e, "value" : json[e] }
  });
""";

WITH input AS (
  SELECT "{\"key1\": \"value1\", \"key2\": \"value2\"}" AS json_column
  UNION ALL
  SELECT "{\"key1\": \"value1\", \"key3\": \"value3\"}" AS json_column
  UNION ALL
  SELECT "{\"key5\": \"value5\"}" AS json_column
)

SELECT
  json_column,
  jsonObjectKeys(json_column) AS keys,
  jsonToKeyValueArray(json_column) AS key_value
FROM input

The problem is that FUNCTION is not the best in term of compute optimization, so I'm trying to understand if there is a way to use plain SQL to achieve these 2 needs (or 1 of them at least) using only SQL w/o functions.

1 Answer 1

10

Below is for BigQuery Standard SQL

#standardsql
select
  json_column,
  array(select trim(split(kv, ':')[offset(0)]) from t.kv kv) as keys,
  array(
    select as struct 
      trim(split(kv, ':')[offset(0)]) as key, 
      trim(split(kv, ':')[offset(1)]) as value
    from t.kv kv
  ) as key_value
from input, 
unnest([struct(split(translate(json_column, '{}"', '')) as kv)]) t    

If to apply to sample data from your question - output is

enter image description here

Sign up to request clarification or add additional context in comments.

5 Comments

Yes it works very well thank you - I am surprised BQ doesn't have a utility function for this
Could it be that this function fails if keys/values contain colons and/or if values are arrays of strings (i.e., contain the " character)?
sure, if you have such edge case and need help - post your question and we will help
@MikhailBerlyant I am trying to decode this but am struggling. On line array(select trim(split(kv, ':')[offset(0)]) from t.kv kv) as keys what does t.kv kv mean? is t = dataset, kv = column?
is there another way to do the UNNEST operation? that is supported by BQ materialized views

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.