1

I want to extract some values for particular keys from a table with json string as below.

raw_data ...
{"label": "XXX", "lines":[{"amount":1000, "category": "A"}, {"amount":100, "category": "B"}, {"amount":10, "category": "C"}]} ...

I am expecting an outcome like

label amount category
XXX [1000, 100, 10] ['A', 'B', 'C']

I am using the following sql query to achieve that

select 
JSON_EXTRACT(raw_data, '$.lines[*].amount') AS amount,
JSON_EXTRACT(raw_data, '$.lines[*].category') AS category,
JSON_EXTRACT(raw_data, '$.label') AS label
from table

I can get a specific element of the list with [0] , [1] etc. But the sql code doesn't work with [*]. I am getting the following error Invalid JSON path: '$.lines[*].amount'

Edit

I am using Presto

4
  • Mysql/Maria please add the platform. Commented Aug 30, 2022 at 0:07
  • What system are you using? Mariadb/Mysql or Presto? They have very different behavior for that function. Commented Aug 30, 2022 at 1:31
  • I am using Presto Commented Aug 30, 2022 at 1:43
  • Presto can extract scalar value only. prestodb.io/docs/current/functions/json.html Iterate over your JSON, extract separate values then combine them to arrays back. Commented Aug 30, 2022 at 6:21

3 Answers 3

1

Json path support in Presto is very limited, so you need to do some processing manually for example with casts and array functions:

-- sample data
with dataset (raw_data) as (
    values '{"label": "XXX", "lines":[{"amount":1000, "category": "A"}, {"amount":100, "category": "B"}, {"amount":10, "category": "C"}]}'
)

-- query
select label,
    transform(lines, l -> l['amount']) amount,
    transform(lines, l -> l['category']) category
from (
    select JSON_EXTRACT(raw_data, '$.label') AS label,
        cast(JSON_EXTRACT(raw_data, '$.lines') as array(map(varchar, json))) lines
    from dataset
);

Output:

label amount category
XXX [1000, 100, 10] ["A", "B", "C"]

In Trino json path support was vastly improved, so you can do next:

-- query
select JSON_EXTRACT(raw_data, '$.label') label,
       JSON_QUERY(raw_data, 'lax $.lines[*].amount' WITH ARRAY WRAPPER) amount,
       JSON_QUERY(raw_data, 'lax $.lines[*].category' WITH ARRAY WRAPPER) category
from dataset;
Sign up to request clarification or add additional context in comments.

Comments

0

You can use json_table and json_arrayagg:

select json_extract(t.raw_data, '$.label'),
  (select json_arrayagg(t1.v) from json_table(t.raw_data, '$.lines[*]' columns (v int path '$.amount')) t1),
  (select json_arrayagg(t1.v) from json_table(t.raw_data, '$.lines[*]' columns (v text path '$.category')) t1)
from tbl t

Comments

0

I was able to get the expected output using unnest to flatten and array_agg to aggregate in Presto. Below is the SQL used and output generated:

WITH dataset AS (
  SELECT
    * from sf_73535794
)
SELECT raw_data.label,array_agg(t.lines.amount) as amount,array_agg(t.lines.category) as category FROM dataset
CROSS JOIN UNNEST(raw_data.lines) as t(lines) group by 1

Output:

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.