9

I have a Postgres table, mytable where one of the field is as follows:

myField JSONB[] NOT NULL

and let's assume the said jsons are of this form:

{ "letter":"A", "digit":30}

What queries should I use to:

  • extract an array of the digit values?
  • extract a json array containing the digit values?
  • extract an array of the digit values where digit > 20?
  • extract a json array of the digit values where digit > 20?

How would the above queries change if I stored the data as json where the json was a list?

  • Can I still make all the above queries?
  • What would be the performance difference?
  • When should I choose one over the other?
1
  • What did you tried ? Commented Jun 17, 2019 at 5:20

1 Answer 1

17

Let's create a table that has both a column of type jsonb[] called pg_array that will store an array JSON objects and a column of type jsonb called json_array that will store a JSON array of objects:

CREATE TABLE mytable (id int, pg_array jsonb[], json_array jsonb);
INSERT INTO mytable VALUES
    (1, ARRAY['{"letter":"A", "digit":30}', '{"letter":"B", "digit":31}']::jsonb[], '[{"letter":"A", "digit":30},{"letter":"B", "digit":31}]'),
    (2, ARRAY['{"letter":"X", "digit":40}', '{"letter":"Y", "digit":41}']::jsonb[], '[{"letter":"X", "digit":40},{"letter":"Y", "digit":41}]');

The queries for both approaches will look very similar because we'll be working on the individual array elements, meaning we'll have to unnest and aggreate again.

To unnest pg_array and get each jsonb object:

SELECT unnest(pg_array);

To unnest json_array and get each jsonb object:

SELECT jsonb_array_elements(json_array);

That's the only difference. Thus, the queries below will look almost identical.

On to your first set of questions:

extract an array of the digit values?

db=# SELECT array_agg((x->>'digit')::int) FROM mytable, unnest(pg_array) x GROUP BY id;
 array_agg
-----------
 {40,41}
 {30,31}
(2 rows)
db=# SELECT array_agg((x->>'digit')::int) FROM mytable, jsonb_array_elements(json_array) x GROUP BY id;
 array_agg
-----------
 {40,41}
 {30,31}
(2 rows)

extract a json array containing the digit values?

db=# SELECT jsonb_agg((x->>'digit')::int) FROM mytable, unnest(pg_array) x GROUP BY id;
 jsonb_agg
-----------
 [40, 41]
 [30, 31]
(2 rows)
db=# SELECT jsonb_agg((x->>'digit')::int) FROM mytable, jsonb_array_elements(json_array) x GROUP BY id;
 jsonb_agg
-----------
 [40, 41]
 [30, 31]
(2 rows)

extract an array of the digit values where digit > 20?

(I've used 30 instead of 20 here.)

db=# SELECT array_agg((x->>'digit')::int) FROM mytable, unnest(pg_array) x WHERE (x->>'digit')::int > 30 GROUP BY id;
 array_agg
-----------
 {40,41}
 {31}
(2 rows)
db=# SELECT array_agg((x->>'digit')::int) FROM mytable, jsonb_array_elements(json_array) x WHERE (x->>'digit')::int > 30 GROUP BY id;
 array_agg
-----------
 {40,41}
 {31}
(2 rows)

extract a json array of the digit values where digit > 20?

(I've used 30 instead of 20 here.)

db=# SELECT jsonb_agg((x->>'digit')::int) FROM mytable, unnest(pg_array) x WHERE (x->>'digit')::int > 30 GROUP BY id;
 jsonb_agg
-----------
 [40, 41]
 [31]
(2 rows)
db=# SELECT jsonb_agg((x->>'digit')::int) FROM mytable, jsonb_array_elements(json_array) x WHERE (x->>'digit')::int > 30 GROUP BY id;
 jsonb_agg
-----------
 [40, 41]
 [31]
(2 rows)

For your second set of questions:

Can I still make all the above queries?

As seen above, yes.

What would be the performance difference?

That boils down to the performance difference of unnest and jsonb_array_elements. Let's compare that with a single row that contains an array with 1,000,000 JSON objects:

TRUNCATE mytable;
INSERT INTO mytable
SELECT 1, array_agg(o), jsonb_agg(o)
FROM (SELECT jsonb_build_object('letter', 'A', 'digit', i) o FROM generate_series(1, 1000000) i) x;
phil=# EXPLAIN ANALYZE SELECT unnest(pg_array) FROM mytable;
                                                QUERY PLAN
-----------------------------------------------------------------------------------------------------------
 ProjectSet  (cost=0.00..35.88 rows=5000 width=32) (actual time=33.357..120.393 rows=1000000 loops=1)
   ->  Seq Scan on mytable  (cost=0.00..10.50 rows=50 width=626) (actual time=0.010..0.013 rows=1 loops=1)
 Planning time: 0.050 ms
 Execution time: 175.670 ms
(4 rows)

phil=# EXPLAIN ANALYZE SELECT jsonb_array_elements(json_array) FROM mytable;
                                                QUERY PLAN
-----------------------------------------------------------------------------------------------------------
 ProjectSet  (cost=0.00..35.88 rows=5000 width=32) (actual time=257.313..399.883 rows=1000000 loops=1)
   ->  Seq Scan on mytable  (cost=0.00..10.50 rows=50 width=721) (actual time=0.010..0.014 rows=1 loops=1)
 Planning time: 0.047 ms
 Execution time: 455.275 ms
(4 rows)

From this it looks like unnest is around 2.5 times faster than jsonb_array_elements.

When should I choose one over the other?

I assume that your dataset isn't big enough for the difference in performance between unnest and jsonb_array_elements to play a role. Thus, I'd just choose what makes more sense in terms of the data. I'd tend to go with jsonb[] as it more clearly communicates that you'll have an array of json objects.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.