2

The problem I found that I cannot make Postgres use GIN index when I use jsonb_* functions in my queries. The problem exists for both jsonb_ops and jsonb_path_ops operator classes.

Let's first make preparations. Create the table

CREATE TABLE applications
(
    id          TEXT PRIMARY KEY,
    application JSONB
);
CREATE INDEX ON applications USING gin (application jsonb_path_ops);

Let's fill the table with some bulk of data (to make Postgres use GIN index)

INSERT INTO applications(id, application)
VALUES ('1', '{
  "type_code": 1,
  "persons": [
    {
      "type_code": 4,
      "firstname": "John",
      "lastname": "Doe"
    }
  ]
}');
INSERT INTO applications (SELECT i, a.application FROM applications a, generate_series(2, 100000) i);

Then I try to select the data like the following:

EXPLAIN ANALYZE
SELECT * FROM applications
WHERE applications.application @? '$.persons[*] ? (@.type_code == 3)';

Note that GIN index was used

-- Bitmap Heap Scan on applications  (cost=64.00..68.01 rows=1 width=130) (actual time=0.410..0.419 rows=0 loops=1)
-- "  Recheck Cond: (application @? '$.""persons""[*]?(@.""type_code"" == 3)'::jsonpath)"
--   ->  Bitmap Index Scan on applications_application_idx  (cost=0.00..64.00 rows=1 width=0) (actual time=0.095..0.096 rows=0 loops=1)
-- "        Index Cond: (application @? '$.""persons""[*]?(@.""type_code"" == 3)'::jsonpath)"
-- Planning Time: 1.493 ms
-- Execution Time: 0.861 ms

Now I try to select the data like this:

EXPLAIN ANALYZE
SELECT * FROM applications
WHERE jsonb_path_exists(
      applications.application,
      '$.persons[*] ? (@.type_code == 3)'
);

You can see that GIN index was not used in this case:

-- Aggregate  (cost=3374.33..3374.34 rows=1 width=8) (actual time=114.048..114.055 rows=1 loops=1)
--   ->  Seq Scan on applications  (cost=0.00..3291.00 rows=33333 width=0) (actual time=0.388..109.580 rows=100000 loops=1)
-- "        Filter: jsonb_path_exists(application, '$.""persons""[*]?(@.""type_code"" == 3)'::jsonpath, '{}'::jsonb, false)"
-- Planning Time: 1.514 ms
-- Execution Time: 114.674 ms

Is it possible to make Postgres use GIN index in the second query?

Using jsonb_* functions is preferred for me because I can use positional parameters to build query:

SELECT * FROM applications
WHERE jsonb_path_exists(
      applications.application,
      '$.persons[*] ? (@.type_code == $person_type_code)',
      jsonb_build_object('person_type_code', $1)
);
1
  • You can't. PostgreSQL indexes are tied to operators in specific operator classes. GIN will help you if you use @@ and @? with JSONPath expressions, but even though there are equivalent jsonb_path_X() functions to what those operators do, the index will only kick in if you use the operator and not the function. There are cases like PostGIS where functions do in fact use the index but that's because they wrap one or add an operator-based condition that's using the index, then use the actual function to just re-check pre-filtered rows. Commented Mar 28, 2024 at 12:07

1 Answer 1

3

How to make Postgres GIN index work with jsonb_* functions?

You can't*. PostgreSQL indexes are tied to operators in specific operator classes:

In general, PostgreSQL indexes can be used to optimize queries that contain one or more WHERE or JOIN clauses of the form

indexed-column indexable-operator comparison-value

Here, the indexed-column is whatever column or expression the index has been defined on. The indexable-operator is an operator that is a member of the index's operator class for the indexed column. And the comparison-value can be any expression that is not volatile and does not reference the index's table.

GIN will help you only if you use the operators in the opclass you used when you defined the index (jsonb_ops by default):

The default GIN operator class for jsonb supports queries with the key-exists operators ?, ?| and ?&, the containment operator @>, and the jsonpath match operators @? and @@.

Even though there are equivalent jsonb_path_X() functions that do the exact same thing those operators do, the index will only kick in if you use the operator and not the function.


*Except you kind of can

There are cases like PostGIS where functions do in fact use the index but that's because they wrap an operator or add an operator-based condition that's using the index, then use the actual function to just re-check pre-filtered rows. You can mimmick that if you want: demo

CREATE OR REPLACE FUNCTION my_jsonb_path_exists(arg1 jsonb,arg2 jsonpath)
RETURNS boolean AS 'SELECT $1 @? $2' LANGUAGE 'sql' IMMUTABLE;

EXPLAIN ANALYZE
SELECT * FROM applications
WHERE my_jsonb_path_exists(
      applications.application,
      '$.persons[*] ? (@.type_code == 3)'
);
QUERY PLAN
Bitmap Heap Scan on applications (cost=165.51..5277.31 rows=21984 width=163) (actual time=15.650..83.960 rows=22219 loops=1)
  Recheck Cond: (application @? '$."persons"[*]?(@."type_code" == 3)'::jsonpath)
  Heap Blocks: exact=4798
  -> Bitmap Index Scan on gin_idx (cost=0.00..160.01 rows=21984 width=0) (actual time=14.891..14.892 rows=22219 loops=1)
        Index Cond: (application @? '$."persons"[*]?(@."type_code" == 3)'::jsonpath)
Planning Time: 0.231 ms
Execution Time: 85.092 ms

You can see now it uses the index because the condition got rewritten as the operator it was wrapping. It finds 22219 matches because I increased the sample set to 200k and randomised the rows.

Sign up to request clarification or add additional context in comments.

9 Comments

Hi and thanks for this thorough answer! I understand that the operator classes supported by an index (e.g. jsonb_ops) only support operators (e.g. @?) but not functions (e.g. jsonb_path_exists()) to use that index. Do you know what the technical reason, if any, behind it is? To me, it seems uncharacteristic that the PostgreSQL team did not extend these operator classes to also include functions, where appropriate.
@Feuermurmel Technically, an operator is just a function like any other and every function can be made into an operator. There are operators that are only that and nothing more. What lets specific operator-based expressions benefit from an index is that those certain operators are equipped with additional functions that the index requires to reason about, structure and navigate the set. That, and the parser grammar. The two examples you refer to are one and the same but the assumption is, if you wanted the operator behaviour, you'd use the equivalent operator.
You can look up access methods, operator classes and operator families to see what more goes into those, on top of just the function you'd like to offer to an index, or go through 36.16. Interfacing Extensions to Indexes and then Chapter 62. Index Access Method Interface Definition
This nicely sums it up: "To be useful, an index access method must also have one or more operator families and operator classes defined in pg_opfamily, pg_opclass, pg_amop, and pg_amproc. These entries allow the planner to determine what kinds of query qualifications can be used with indexes of this access method."
I'm not even sure how you'd expose the 3-arg function as an operator that just accepts 2 operands, left and right. Vars would have to get embedded in the jsonpath-type value, so you'd have to have a dedicated jsonbpath constructor there on the right, feeding it those vars. I don't think jsonpath currently has a way to accommodate this type of thing. There's years of discussion to go through, it might've been explored already and my guesswork here might very well be completely misguided and way off.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.