1

In Postgres 11.x I am trying to aggregate elements in a nested jsonb object which has an array field into a single row per device_id. Here's example data for a table called configurations.

id device_id data
1 1 "{""sensors"": [{""other_data"": {}, ""sensor_type"": 1}], ""other_data"": {}}"
2 1 "{""sensors"": [{""other_data"": {}, ""sensor_type"": 1}, {""other_data"": {}, ""sensor_type"": 2}], ""other_data"": {}}"
3 1 "{""sensors"": [{""other_data"": {}, ""sensor_type"": 3}], ""other_data"": {}}"
4 2 "{""sensors"": [{""other_data"": {}, ""sensor_type"": 4}], ""other_data"": {}}"
5 2 "{""sensors"": null, ""other_data"": {}}"
6 3 "{""sensors"": [], ""other_data"": {}}"

My goal output would have a single row per device_id with an array of distinct sensor_types, example:

device_id sensor_types
1 [1,2,3]
2 [4]
3 [ ] null would also be fine here

Tried a bunch of things but running into various problems, here's some SQL to set up a test environment:

CREATE TEMPORARY TABLE configurations(
   id SERIAL PRIMARY KEY,
   device_id SERIAL,
   data JSONB
);

INSERT INTO configurations(device_id, data) VALUES
    (1, '{ "other_data": {}, "sensors": [ { "sensor_type": 1, "other_data": {} } ] }'),
    (1, '{ "other_data": {}, "sensors": [ { "sensor_type": 1, "other_data": {} }, { "sensor_type": 2, "other_data": {} }] }'),
    (1, '{ "other_data": {}, "sensors": [ { "sensor_type": 3, "other_data": {} }] }'),
    (2, '{ "other_data": {}, "sensors": [ { "sensor_type": 4, "other_data": {} }] }'),
    (2, '{ "other_data": {}, "sensors": null }'),
    (3, '{ "other_data": {}, "sensors": [] }');

Quick note, my real table has about 100,000 rows and the jsonb data is much more complicated but follows this general structure.

1 Answer 1

1

The JSONB null causes some problems in Postgres and should rather be avoided when possible. You can convert the value to an empty array with the expression

coalesce(nullif(data->'sensors', 'null'), '[]')

The first attempt:

select device_id, array_agg(distinct value->'sensor_type') as sensor_types
from configurations
left join jsonb_array_elements(coalesce(nullif(data->'sensors', 'null'), '[]')) on true
group by device_id;

 device_id | sensor_types
-----------+--------------
         1 | {1,2,3}
         2 | {4,NULL}
         3 | {NULL}
(3 rows)

may be unsatisfactory because of nulls in the result. When trying to remove them

select device_id, array_agg(distinct value->'sensor_type') as sensor_types
from configurations
left join jsonb_array_elements(coalesce(nullif(data->'sensors', 'null'), '[]')) on true
where value is not null
group by device_id;

 device_id | sensor_types
-----------+--------------
         1 | {1,2,3}
         2 | {4}
(2 rows)

device_id = 3 disappears. Well, we can get all device_ids from the table:

select distinct device_id, sensor_types
from configurations
left join (
    select device_id, array_agg(distinct value->'sensor_type') as sensor_types
    from configurations
    left join jsonb_array_elements(coalesce(nullif(data->'sensors', 'null'), '[]')) on true
    where value is not null
    group by device_id
    ) s
using(device_id);

 device_id | sensor_types
-----------+--------------
         1 | {1,2,3}
         2 | {4}
         3 |
(3 rows)
Sign up to request clarification or add additional context in comments.

2 Comments

This worked great, thanks! A few little tricks in there I wasn't aware of. Out of curiosity if I was looking to make this perform well are there any easy index wins or would it make sense to look at a materialized view?
The jsonb_array_elements() function is indeed expensive, unfortunately there is no index to speed it up.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.