0

I am ingesting .json data into Google BigQuery, and on the ingest, the data types for arrays and objects from the .json are both being cast into string columns. The data in BigQuery then looks like this:

select 1 as id, '[]' as stringCol1, '[]' as stringCol2 union all
select 2 as id, null as stringCol1, null as stringCol2 union all
select 3 as id, "{'game': '22', 'year': 'sophomore'}" as stringCol1, "[{'teamName': 'teamA', 'teamAge': 37}, {'teamName': 'teamB', 'teamAge': 32]" as stringCol2 union all
select 4 as id, "{'game': '17', 'year': 'freshman'}" as stringCol1, "[{'teamName': 'teamA', 'teamAge': 32}, {'teamName': 'teamB', 'teamAge': 33]" as stringCol2 union all
select 5 as id, "{'game': '9', 'year': 'senior'}" as stringCol1, "[{'teamName': 'teamC', 'teamAge': 31}, {'teamName': 'teamD', 'teamAge': 17]" as stringCol2 union all
select 6 as id, "{'game': '234', 'year': 'junior'}" as stringCol1, "[{'teamName': 'teamC', 'teamAge': 42}, {'teamName': 'teamD', 'teamAge': 25]" as stringCol2

enter image description here

The data is a bit messy.

  • In stringCol1, there are both null and '[]' values for missing data. I'd like to create the 2 columns game and year from this stringified object.
  • For stringCol2, this is always an array with 2 objects, with identical keys (teamName and teamAge, in this case). This then needs to be cast into the 4 columns teamName1, teamAge1, teamName2, teamAge2.

This similar post addressed converting a basic stringified array into a not-stringified array, but this example here is a bit more complex. In particular, the solution in that other post does not work in this case.

1 Answer 1

2

Below is for BigQuery Standard SQL

#standardSQL
SELECT id,
  JSON_EXTRACT_SCALAR(stringCol1, '$.game') AS game,
  JSON_EXTRACT_SCALAR(stringCol1, '$.year') AS year,
  JSON_EXTRACT_SCALAR(t1, '$.teamName') AS teamName1,
  JSON_EXTRACT_SCALAR(t1, '$.teamAge') AS teamAge1,
  JSON_EXTRACT_SCALAR(t2, '$.teamName') AS teamName2,
  JSON_EXTRACT_SCALAR(t2, '$.teamAge') AS teamAge2
FROM `project.dataset.table`,
UNNEST([STRUCT(
  JSON_EXTRACT_ARRAY(stringCol2)[SAFE_OFFSET(0)] AS t1, 
  JSON_EXTRACT_ARRAY(stringCol2)[SAFE_OFFSET(1)] AS t2
)])   

If to apply to sample data from your question

WITH `project.dataset.table` AS (
  SELECT 1 AS id, '[]' AS stringCol1, '[]' AS stringCol2 UNION ALL
  SELECT 2 AS id, NULL AS stringCol1, NULL AS stringCol2 UNION ALL
  SELECT 3 AS id, "{'game': '22', 'year': 'sophomore'}" AS stringCol1, "[{'teamName': 'teamA', 'teamAge': 37}, {'teamName': 'teamB', 'teamAge': 32}]" AS stringCol2 UNION ALL
  SELECT 4 AS id, "{'game': '17', 'year': 'freshman'}" AS stringCol1, "[{'teamName': 'teamA', 'teamAge': 32}, {'teamName': 'teamB', 'teamAge': 33}]" AS stringCol2 UNION ALL
  SELECT 5 AS id, "{'game': '9', 'year': 'senior'}" AS stringCol1, "[{'teamName': 'teamC', 'teamAge': 31}, {'teamName': 'teamD', 'teamAge': 17}]" AS stringCol2 UNION ALL
  SELECT 6 AS id, "{'game': '234', 'year': 'junior'}" AS stringCol1, "[{'teamName': 'teamC', 'teamAge': 42}, {'teamName': 'teamD', 'teamAge': 25}]" AS stringCol2
) 

output is

Row id  game    year        teamName1   teamAge1    teamName2   teamAge2     
1   1   null    null        null        null        null        null     
2   2   null    null        null        null        null        null     
3   3   22      sophomore   teamA       37          teamB       32   
4   4   17      freshman    teamA       32          teamB       33   
5   5   9       senior      teamC       31          teamD       17   
6   6   234     junior      teamC       42          teamD       25      

There can be quite a number of variations of above to improve readability for example

#standardSQL
SELECT id,
  JSON_EXTRACT_SCALAR(stringCol1, '$.game') AS game,
  JSON_EXTRACT_SCALAR(stringCol1, '$.year') AS year,
  JSON_EXTRACT_SCALAR(t[SAFE_OFFSET(0)], '$.teamName') AS teamName1,
  JSON_EXTRACT_SCALAR(t[SAFE_OFFSET(0)], '$.teamAge') AS teamAge1,
  JSON_EXTRACT_SCALAR(t[SAFE_OFFSET(1)], '$.teamName') AS teamName2,
  JSON_EXTRACT_SCALAR(t[SAFE_OFFSET(1)], '$.teamAge') AS teamAge2
FROM `project.dataset.table`,
UNNEST([STRUCT(JSON_EXTRACT_ARRAY(stringCol2) AS t)])
Sign up to request clarification or add additional context in comments.

1 Comment

Very helpful, thank you. json_extract_* seems like a powerful function in BigQuery

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.