3

Let's say we have this table with a million rows:

CREATE TABLE stuff(
    foo integer,
    bar integer
);

INSERT INTO stuff(foo, bar) SELECT
    (random()*1000000) :: integer,
    (random()*1000000) :: integer
FROM generate_series(1, 1000000);

CREATE INDEX index1 ON stuff(foo ASC, bar ASC);
CREATE INDEX index2 ON stuff(foo ASC, bar DESC);

VACUUM ANALYZE stuff;

This table backs an API endpoint where the user can query the data specifying a custom sort order, possibly on multiple columns, and fetch the data in pages of 100. When fetching a page, the user gets a cursor that encodes the necessary data to get the next page.

When fetching rows with ORDER BY foo ASC, bar ASC, the first query would be:

postgres=# EXPLAIN ANALYZE SELECT * FROM stuff ORDER BY foo ASC, bar ASC LIMIT 100;
                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..4.79 rows=100 width=8) (actual time=0.104..3.611 rows=100 loops=1)
   ->  Index Only Scan using index1 on stuff  (cost=0.42..43679.92 rows=1000000 width=8) (actual time=0.068..1.297 rows=100 loops=1)
         Heap Fetches: 100
 Planning Time: 0.150 ms
 Execution Time: 4.915 ms
(5 rows)

and a query with a cursor pointing to the middle of the table would be:

postgres=# EXPLAIN ANALYZE SELECT * FROM stuff WHERE (foo, bar) > (543210, 1234) ORDER BY foo ASC, bar ASC LIMIT 100;
                                                             QUERY PLAN                                                             
------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..7.14 rows=100 width=8) (actual time=0.142..3.625 rows=100 loops=1)
   ->  Index Only Scan using index1 on stuff  (cost=0.42..30730.05 rows=457498 width=8) (actual time=0.115..1.279 rows=100 loops=1)
         Index Cond: (ROW(foo, bar) > ROW(543210, 1234))
         Heap Fetches: 100
 Planning Time: 0.325 ms
 Execution Time: 4.857 ms
(6 rows)

This query is very fast because the optimizer knows how to use the WHERE clause to "jump" into the middle of the index, and only fetch 100 rows from there.

Now, if the user wants to sort by foo ASC, bar DESC, I can't think of any equivalent for WHERE (foo, bar) > (543210, 1234). We have to write the query like this:

postgres=# EXPLAIN ANALYZE SELECT * FROM stuff WHERE foo > 543210 OR (foo = 543210 AND bar < 1234) ORDER BY foo ASC, bar DESC LIMIT 100;
                                                               QUERY PLAN                                                               
----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..11.61 rows=100 width=8) (actual time=226.836..230.294 rows=100 loops=1)
   ->  Index Only Scan using index2 on stuff  (cost=0.42..51179.92 rows=457498 width=8) (actual time=226.813..227.959 rows=100 loops=1)
         Filter: ((foo > 543210) OR ((foo = 543210) AND (bar < 1234)))
         Rows Removed by Filter: 543630
         Heap Fetches: 543730
 Planning Time: 0.275 ms
 Execution Time: 231.521 ms
(7 rows)

Now the query is too slow! Apparently, postgres isn't smart enough to figure out that it can "jump" into the middle of the index, and does a full index scan instead.

Is there any way to make the second query fast, like the first one? It seems to me that it should be theoretically possible.

3 Answers 3

3

This is a generic (and slightly academic) answer that shows how this problem can be solved in general by defining your own type and operator class.

I'll use strings (type text) as an example.

First, we have to define our own type invtext exactly like text:

/* create a shell type */
CREATE TYPE invtext;

/* copy type functions from "text" */
CREATE FUNCTION invtextin(cstring) RETURNS invtext
   LANGUAGE internal IMMUTABLE PARALLEL SAFE
   AS 'textin';

CREATE FUNCTION invtextout(invtext) RETURNS cstring
   LANGUAGE internal IMMUTABLE PARALLEL SAFE
   AS 'textout';

CREATE FUNCTION invtextrecv(internal) RETURNS invtext
   LANGUAGE internal STABLE PARALLEL SAFE
   AS 'textrecv';

CREATE FUNCTION invtextsend(invtext) RETURNS bytea
   LANGUAGE internal STABLE PARALLEL SAFE
   AS 'textsend';

/* create type like "text" */
CREATE TYPE invtext (
    INPUT = invtextin,
    OUTPUT = invtextout,
    RECEIVE = invtextrecv,
    SEND = invtextsend,
    LIKE = text,
    CATEGORY = 'S',
    DELIMITER = ',',
    COLLATABLE = TRUE
);

We need casts to and from text:

CREATE CAST (text AS invtext) WITHOUT FUNCTION;
CREATE CAST (invtext AS text) WITHOUT FUNCTION AS IMPLICIT;

We use the same comparison functions as text, except we turn around the order:

CREATE FUNCTION invtexteq(invtext, invtext) RETURNS boolean
   LANGUAGE internal IMMUTABLE PARALLEL SAFE LEAKPROOF STRICT
   AS 'texteq';

CREATE FUNCTION invtextne(invtext, invtext) RETURNS boolean
   LANGUAGE internal IMMUTABLE PARALLEL SAFE LEAKPROOF STRICT
   AS 'textne';

CREATE FUNCTION invtext_lt(invtext, invtext) RETURNS boolean
   LANGUAGE internal IMMUTABLE PARALLEL SAFE STRICT
   AS 'text_gt';

CREATE FUNCTION invtext_gt(invtext, invtext) RETURNS boolean
   LANGUAGE internal IMMUTABLE PARALLEL SAFE STRICT
   AS 'text_lt';

CREATE FUNCTION invtext_le(invtext, invtext) RETURNS boolean
   LANGUAGE internal IMMUTABLE PARALLEL SAFE STRICT
   AS 'text_ge';

CREATE FUNCTION invtext_ge(invtext, invtext) RETURNS boolean
   LANGUAGE internal IMMUTABLE PARALLEL SAFE STRICT
   AS 'text_le';

Now we can define the corresponding operators:

CREATE OPERATOR = (
   FUNCTION = invtexteq,
   LEFTARG = invtext,
   RIGHTARG = invtext,
   COMMUTATOR = "=",
   NEGATOR = "<>",
   RESTRICT = eqsel,
   JOIN = eqjoinsel,
   HASHES,
   MERGES
);

CREATE OPERATOR <> (
   FUNCTION = invtextne,
   LEFTARG = invtext,
   RIGHTARG = invtext,
   COMMUTATOR = "<>",
   NEGATOR = "=",
   RESTRICT = neqsel,
   JOIN = neqjoinsel
);

CREATE OPERATOR < (
   FUNCTION = invtext_lt,
   LEFTARG = invtext,
   RIGHTARG = invtext,
   COMMUTATOR = ">",
   NEGATOR = ">=",
   RESTRICT = scalargtsel,
   JOIN = scalargtjoinsel
);

CREATE OPERATOR > (
   FUNCTION = invtext_gt,
   LEFTARG = invtext,
   RIGHTARG = invtext,
   COMMUTATOR = "<",
   NEGATOR = "<=",
   RESTRICT = scalarltsel,
   JOIN = scalarltjoinsel
);

CREATE OPERATOR <= (
   FUNCTION = invtext_le,
   LEFTARG = invtext,
   RIGHTARG = invtext,
   COMMUTATOR = ">=",
   NEGATOR = ">",
   RESTRICT = scalargesel,
   JOIN = scalargejoinsel
);

CREATE OPERATOR >= (
   FUNCTION = invtext_ge,
   LEFTARG = invtext,
   RIGHTARG = invtext,
   COMMUTATOR = "<=",
   NEGATOR = "<",
   RESTRICT = scalarlesel,
   JOIN = scalarlejoinsel
);

The required B-tree support function is just the opposite of the one from text:

CREATE FUNCTION btinvtextcmp(invtext, invtext) RETURNS integer
   LANGUAGE sql IMMUTABLE PARALLEL SAFE STRICT
   AS 'SELECT bttextcmp($2, $1)';

Now we can define a default operator class for invtext:

CREATE OPERATOR CLASS invtext_ops
   DEFAULT FOR TYPE invtext USING btree FAMILY text_ops AS
      OPERATOR 1 <,
      OPERATOR 2 <=,
      OPERATOR 3 =,
      OPERATOR 4 >=,
      OPERATOR 5 >,
      FUNCTION 1 btinvtextcmp(invtext, invtext);

Done! Let's test how it works.

Let's create a test table:

CREATE TABLE stuff(
    foo text,
    bar text
);

INSERT INTO stuff(foo, bar) SELECT
    substr(md5((random()*1000000)::text), 4, 4),
    substr(md5((random()*1000000)::text), 4, 4)
FROM generate_series(1, 1000000);

CREATE INDEX index1 ON stuff(foo ASC, (bar::invtext) ASC);

VACUUM ANALYZE stuff;

Sort foo in ascending order and bar in descending order (which is ascending order for invtext):

SELECT * FROM stuff
WHERE (foo, (bar::invtext)) > ('f000', '8000')
ORDER BY foo ASC, (bar::invtext) ASC
LIMIT 100;

 foo  | bar  
------+------
 f000 | 75ab
 f000 | 6d17
 f000 | 6c78
 f000 | 51ed
 f000 | 2d84
 f000 | 2227
 f000 | 0f72
 f001 | f0bf
 f001 | eae8
 f001 | d56a
 f001 | c6dd
 f001 | ba4e
 f001 | 9606
 f001 | 8e2a
 f001 | 7d58
 f001 | 7aed
 f001 | 719c
 f001 | 5ce9
 f001 | 354a
 f001 | 137d
 f001 | 107d
 f002 | ee18
 f002 | e7a8
...

Test if the index is used:

EXPLAIN (COSTS off)
SELECT * FROM stuff
WHERE (foo, (bar::invtext)) > ('f000', '8000')
ORDER BY foo ASC, (bar::invtext) ASC
LIMIT 100;

                                  QUERY PLAN                                   
-------------------------------------------------------------------------------
 Limit
   ->  Index Scan using index1 on stuff
         Index Cond: (ROW(foo, (bar)::invtext) > ROW('f'::text, '1'::invtext))
(3 rows)
2

Correct, PostgreSQL does not make every optimization which is theoretically possible.

You can try to force it into doing what you want with a UNION ALL query:

select * from (
    (SELECT * FROM stuff 
         WHERE foo = 543210 AND bar < 1234 ORDER BY foo ASC, bar desc LIMIT 100) 
    union all 
    (SELECT * FROM stuff 
         WHERE foo > 543210 ORDER BY foo ASC, bar DESC LIMIT 100)
) as sdfasf ORDER BY foo ASC, bar DESC LIMIT 100;

This form of the query does seem to do an unnecessary sort, but since it is over at most 100 rows, it is probably of no consequence.

0

You could

CREATE INDEX ON stuff (foo, (-bar));

and query like

SELECT * FROM stuff
WHERE (foo, (-bar)) > (543210, -1234)
ORDER BY foo, -bar;
2
  • Thanks for the answer! Unfortunately that trick only works for numbers. Is there any way that works for arbitrary orderable types? Commented Aug 20, 2019 at 15:29
  • See my other answer. Commented Aug 20, 2019 at 21:18

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.