28

It so happens I have a stringified array in a field in BigQuery

'["a","b","c"]'

and I want to convert it to an array that BigQuery understands. I want to be able to do this in standard SQL:

with k as (select '["a","b","c"]' as x)
select unnested_x from k, unnest(x) unnested_x

I have tried JSON_EXTRACT('["a","b","c"]','$') and everything else I could find online.

Any ideas?

0

5 Answers 5

28

Recently (2020) the JSON_EXTRACT_ARRAY function was added to the bigquery standard sql.

It makes it easy to get the expected behavior with no UDF or tricks

with k as (select JSON_EXTRACT_ARRAY('["a","b","c"]', '$') as x)
select unnested_x from k, unnest(x) unnested_x

Will result in:

╔══════════════╗
║ "unnested_x" ║
╠══════════════╣
║     "a"      ║
║     "b"      ║
║     "c"      ║
╚══════════════╝

JSON_EXTRACT_ARRAY doc

Sign up to request clarification or add additional context in comments.

Comments

25

Below is for BigQuery Standard SQL

#standardSQL
WITH k AS (
  SELECT 1 AS id, '["a","b","c"]' AS x UNION ALL
  SELECT 2, '["x","y"]' 
)
SELECT 
  id, 
  ARRAY(SELECT * FROM UNNEST(SPLIT(SUBSTR(x, 2 , LENGTH(x) - 2)))) AS x
FROM k

It transforms your string column into array column

5 Comments

this works for a simple array (which the OP asked about), but doesn't handle more complex json unnesting.
@RyanTuck - obviously provided answer is for specific question! if you need more generic solution - please post your question with respective details - and I (or someone else here on SO) will be happy to help you :o)
definitely! i did find a more generic solution using a UDF and added my own answer here :) - do you know if this is possible to accomplish without resorting to UDFs?
@RyanTuck - i just don't see any needs in more generic/expensive solution for case in given question. If you feel you have use-case which does require - post your question so we will answer it :o)
I agree for the given question a more generic solution is unnecessary. I've asked the more generic question here: stackoverflow.com/questions/57117805/…
6

This solution updates @northtree's answer, and more elegantly handles returning the members of the array as stringified JSON objects as opposed to returning [object Object] strings:

CREATE TEMP FUNCTION
  JSON_EXTRACT_ARRAY(input STRING)
  RETURNS ARRAY<STRING>
  LANGUAGE js AS """  
return JSON.parse(input).map(x => JSON.stringify(x));
""";

with

raw as (
  select
    1 as id,
    '[{"a": 5, "b": 6}, {"a": 7}, 456]' as body
)

select
  id,
  entry,
  json_extract(entry, '$'),
  json_extract(entry, '$.a'),
  json_extract(entry, '$.b')
from
  raw,
  unnest(json_extract_array(body)) as entry

Comments

4

I want to offer an alternative. As the array is a string, simply extract the value using regexp_extract_all:

REGEXP_EXTRACT_ALL(your_string, r'[0-9a-zA-Z][^"]+') as arr

You may find the regex too restrictive to start with an alphanumeric; you can just tweak it to your liking.

Comments

2

It would be much easier via JS UDF.

CREATE TEMP FUNCTION
  JSON_EXTRACT_ARRAY(input STRING)
  RETURNS ARRAY<STRING>
  LANGUAGE js AS """  
return JSON.parse(input);
""";
WITH
  k AS (
  SELECT
    '["a","b","c"]' AS x)
SELECT
  JSON_EXTRACT_ARRAY(x) AS x
FROM
  k

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.