Strange PostgreSQL index using while using LIMIT..OFFSET

Question

PostgreSQL 9.6.3 on x86_64-pc-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit

Table and indices:

create table if not exists orders
(
    id bigserial not null constraint orders_pkey primary key,
    partner_id integer,
    order_id varchar,
    date_created date,
    state_code integer,
    state_date timestamp,
    recipient varchar,
    phone varchar,
);

create index if not exists orders_partner_id_index on orders (partner_id);
create index if not exists orders_order_id_index on orders (order_id);
create index if not exists orders_partner_id_date_created_index on orders (partner_id, date_created);

The task is to create paging/sorting/filtering data.

The query for the first page:

select order_id, date_created, recipient, phone, state_code, state_date
from orders
where partner_id=1 and date_created between '2019-04-01' and '2019-04-30'
order by order_id asc limit 10 offset 0;

The query plan:

QUERY PLAN
"Limit  (cost=19495.48..38990.41 rows=10 width=91)"
"  ->  Index Scan using orders_order_id_index on orders  (cost=0.56..**41186925.66** rows=21127 width=91)"
"        Filter: ((date_created >= '2019-04-01'::date) AND (date_created <= '2019-04-30'::date) AND (partner_id = 1))"

Index orders_partner_id_date_created_index is not used, so the cost is extremely high!

But starting from some offset values (the exact value differs from time to time, looks like it depends on total row count) the index starts to be used:

select order_id, date_created, recipient, phone, state_code, state_date
from orders
where partner_id=1 and date_created between '2019-04-01' and '2019-04-30'
order by order_id asc limit 10 offset 40;

Plan:

QUERY PLAN
"Limit  (cost=81449.76..81449.79 rows=10 width=91)"
"  ->  Sort  (cost=81449.66..81502.48 rows=21127 width=91)"
"        Sort Key: order_id"
"        ->  Bitmap Heap Scan on orders  (cost=4241.93..80747.84 rows=21127 width=91)"
"              Recheck Cond: ((partner_id = 1) AND (date_created >= '2019-04-01'::date) AND (date_created <= '2019-04-30'::date))"
"              ->  Bitmap Index Scan on orders_partner_id_date_created_index  (cost=0.00..4236.65 rows=21127 width=0)"
"                    Index Cond: ((partner_id = 1) AND (date_created >= '2019-04-01'::date) AND (date_created <= '2019-04-30'::date))"

What's happening? Is this a way to force the server to use the index?

jbet · Accepted Answer · 2019-06-06 12:53:08Z

General answer:

Postgres stores some information about your tables
Before executing the query, planner prepares execution plan based on those informations
In your case, planner thinks that for certain offset value this sub-optimal plan will be better. Note that your desired plan requires sorting all selected rows by order_id, while this "worse" plan does not. I'd guess that Postgres bets there will be quite many such rows for various orders and just tests one order after another, starting from lowest.

I can think of two solutions:

A) provide more data to the planer, by running

ANALYZE orders;

(https://www.postgresql.org/docs/9.6/sql-analyze.html)

or bo changing gathered statistics

ALTER TABLE orders SET STATISTCS (...);

(https://www.postgresql.org/docs/9.6/planner-stats.html)

B) Rewrite query in a way that hints desired index usage, like this:

WITH
partner_date (partner_id, date_created) AS (
    SELECT  1,
            generate_series('2019-04-01'::date, '2019-04-30'::date, '1 day'::interval)::date
)
SELECT o.order_id, o.date_created, o.recipient, o.phone, o.state_code, o.state_date
FROM   orders o
JOIN   partner_date pd
    ON (o.partner_id, o.date_created) = (pd.partner_id, pd.date_created)
ORDER BY order_id ASC LIMIT 10 OFFSET 0;

Or maybe even better:

WITH
partner_date (partner_id, date_created) AS (
    SELECT  1,
            generate_series('2019-04-01'::date, '2019-04-30'::date, '1 day'::interval)::date
), 
all_data AS (
    SELECT o.order_id, o.date_created, o.recipient, o.phone, o.state_code, o.state_date
    FROM   orders o
    JOIN   partner_date pd
        ON (o.partner_id, o.date_created) = (pd.partner_id, pd.date_created)
)
SELECT *
FROM   all_data
ORDER BY order_id ASC LIMIT 10 OFFSET 0;

Disclaimer - I can't explain why the first query should be interpreted in other way by Postgres planner, just think it could. On the other hand, second query separates offsets/limits from joins and I'd be very surprised if Postgres still did it the "bad" (according to you benchmarks) way.

Collectives™ on Stack Overflow

Strange PostgreSQL index using while using LIMIT..OFFSET

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related