Postgres jsonb search in array with greater operator (with jsonb_array_elements)

Question

I try to search a solution but I didn't find anything for my case...

Here is the database declaration (simplified):

CREATE TABLE documents (
    document_id int4 NOT NULL GENERATED BY DEFAULT AS IDENTITY,
    data_block jsonb NULL
);

And this is an example of insert.

INSERT INTO documents (document_id, data_block)
VALUES(878979, 
    {"COMMONS": {"DATE": {"value": "2017-03-11"}},
     "PAYABLE_INVOICE_LINES": [
         {"AMOUNT": {"value": 52408.53}}, 
         {"AMOUNT": {"value": 654.23}}
     ]});
INSERT INTO documents (document_id, data_block)
VALUES(977656, 
    {"COMMONS": {"DATE": {"value": "2018-03-11"}},
     "PAYABLE_INVOICE_LINES": [
         {"AMOUNT": {"value": 555.10}}
     ]});

I want to search all documents where one of the PAYABLE_INVOICE_LINES has a line with a value greater than 1000.00

My query is

select *
from documents d
cross join lateral jsonb_array_elements(d.data_block -> 'PAYABLE_INVOICE_LINES') as pil 
where (pil->'AMOUNT'->>'value')::decimal >= 1000

But, as I want to limit to 50 documents, I have to group on the document_id and limit the result to 50.

With millions of documents, this query is very expensive... 10 seconds with 1 million.

Do you have some ideas to have better performance ?

Thanks

I'm stuck on PG 9.3 at the moment so don't have that data type yet, but I briefly worked on a PG 9.6 project where we stored data blobs in jsonb fields, and you could create an index on values in that field which had pretty ok performance. Maybe that's what you should look into, if you have to keep the structure as it is. — 404
– 404, Commented Mar 31, 2018 at 10:55

Abelisto · Accepted Answer · 2018-03-31 19:04:21Z

5

Instead of cross join lateral use where exists:

select *
from documents d
where exists (
  select 1
  from jsonb_array_elements(d.data_block -> 'PAYABLE_INVOICE_LINES') as pil
  where (pil->'AMOUNT'->>'value')::decimal >= 1000)
limit 50;

Update

And yet another method, more complex but also much more efficient.

Create function that returns max value from your JSONB data, like this:

create function fn_get_max_PAYABLE_INVOICE_LINES_value(JSONB) returns decimal language sql as $$
  select max((pil->'AMOUNT'->>'value')::decimal)
  from jsonb_array_elements($1 -> 'PAYABLE_INVOICE_LINES') as pil $$

Create index on this function:

create index idx_max_PAYABLE_INVOICE_LINES_value
  on documents(fn_get_max_PAYABLE_INVOICE_LINES_value(data_block));

Use function in your query:

select *
from documents d
where fn_get_max_PAYABLE_INVOICE_LINES_value(data_block) > 1000
limit 50;

In this case the index will be used and query will be much faster on large amount of data.

PS: Usually limit have sense in pair with order by.

edited Mar 31, 2018 at 19:04

answered Mar 31, 2018 at 18:43

Abelisto

15.8k3 gold badges38 silver badges47 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Andomar Over a year ago

No reason to believe rewriting join to exists would do anything. But creating an index on the function should make all the difference

Abelisto Over a year ago

@Andomar Probably you are right, execution time almost same on my quick test: dbfiddle.uk/… However IMO using exists makes query more clear.

Ryu Over a year ago

Hum, I'm just trying on my database, and the exists solution seems to be very efficient. Using a specific function can be a better solution but I have to do this work for many others operators like >=, <, <=, !=, ... because it is in fact a search engine... Really thank you !!!

Abelisto Over a year ago

@Ryu "but I have to do this work for many others operators like" There are three functions and indexes on them cold make your engine faster: fn_get_max() returns numeric..., fn_get_min() returns numeric... and fn_get_array() returns numeric[]...; index on documents using gin(fn_get_array(data_block)); Thus, >= operator could be realized like where fn_get_max(data_block) > 1000 or array[1000] <@ fn_get_array(data_block)

Andomar · Accepted Answer · 2018-03-31 11:00:16Z

1

Grouping and limiting is easy enough:

select  document_id
from    documents d
cross join lateral 
        jsonb_array_elements(d.data_block -> 'PAYABLE_INVOICE_LINES') as pil 
where   (pil->'AMOUNT'->>'value')::decimal >= 1000
group by
        document_id
limit   50

If you query this more often, you could store a list of documents and invoice lines in a separate table. When you're adding, modifying or deleting documents, you'd have to keep the separate table up to date too. But querying a regular table is much faster than querying JSON columns.

answered Mar 31, 2018 at 11:00

Andomar

239k55 gold badges387 silver badges412 bronze badges

2 Comments

Ryu Over a year ago

Of course, I can group like this. But grouping over millions of lines is very slow because PAYABLE_INVOICE_LINES may have hundred of lines.

Ryu Over a year ago

And change the structure is not a option because we don't know it. It can change from a document type to another. I really search for an optimization of the query on this context.

Collectives™ on Stack Overflow

Postgres jsonb search in array with greater operator (with jsonb_array_elements)

2 Answers 2

4 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related