1

PostgreSQL 9.6.3 on x86_64-pc-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit

Table and indices:

create table if not exists orders
(
    id bigserial not null constraint orders_pkey primary key,
    partner_id integer,
    order_id varchar,
    date_created date,
    state_code integer,
    state_date timestamp,
    recipient varchar,
    phone varchar,
);

create index if not exists orders_partner_id_index on orders (partner_id);
create index if not exists orders_order_id_index on orders (order_id);
create index if not exists orders_partner_id_date_created_index on orders (partner_id, date_created);

The task is to create paging/sorting/filtering data.

The query for the first page:

select order_id, date_created, recipient, phone, state_code, state_date
from orders
where partner_id=1 and date_created between '2019-04-01' and '2019-04-30'
order by order_id asc limit 10 offset 0;

The query plan:

QUERY PLAN
"Limit  (cost=19495.48..38990.41 rows=10 width=91)"
"  ->  Index Scan using orders_order_id_index on orders  (cost=0.56..**41186925.66** rows=21127 width=91)"
"        Filter: ((date_created >= '2019-04-01'::date) AND (date_created <= '2019-04-30'::date) AND (partner_id = 1))"

Index orders_partner_id_date_created_index is not used, so the cost is extremely high!

But starting from some offset values (the exact value differs from time to time, looks like it depends on total row count) the index starts to be used:

select order_id, date_created, recipient, phone, state_code, state_date
from orders
where partner_id=1 and date_created between '2019-04-01' and '2019-04-30'
order by order_id asc limit 10 offset 40;

Plan:

QUERY PLAN
"Limit  (cost=81449.76..81449.79 rows=10 width=91)"
"  ->  Sort  (cost=81449.66..81502.48 rows=21127 width=91)"
"        Sort Key: order_id"
"        ->  Bitmap Heap Scan on orders  (cost=4241.93..80747.84 rows=21127 width=91)"
"              Recheck Cond: ((partner_id = 1) AND (date_created >= '2019-04-01'::date) AND (date_created <= '2019-04-30'::date))"
"              ->  Bitmap Index Scan on orders_partner_id_date_created_index  (cost=0.00..4236.65 rows=21127 width=0)"
"                    Index Cond: ((partner_id = 1) AND (date_created >= '2019-04-01'::date) AND (date_created <= '2019-04-30'::date))"

What's happening? Is this a way to force the server to use the index?

1 Answer 1

3

General answer:

  • Postgres stores some information about your tables
  • Before executing the query, planner prepares execution plan based on those informations
  • In your case, planner thinks that for certain offset value this sub-optimal plan will be better. Note that your desired plan requires sorting all selected rows by order_id, while this "worse" plan does not. I'd guess that Postgres bets there will be quite many such rows for various orders and just tests one order after another, starting from lowest.

I can think of two solutions:

A) provide more data to the planer, by running

ANALYZE orders;

(https://www.postgresql.org/docs/9.6/sql-analyze.html)

or bo changing gathered statistics

ALTER TABLE orders SET STATISTCS (...);

(https://www.postgresql.org/docs/9.6/planner-stats.html)

B) Rewrite query in a way that hints desired index usage, like this:

WITH
partner_date (partner_id, date_created) AS (
    SELECT  1,
            generate_series('2019-04-01'::date, '2019-04-30'::date, '1 day'::interval)::date
)
SELECT o.order_id, o.date_created, o.recipient, o.phone, o.state_code, o.state_date
FROM   orders o
JOIN   partner_date pd
    ON (o.partner_id, o.date_created) = (pd.partner_id, pd.date_created)
ORDER BY order_id ASC LIMIT 10 OFFSET 0;

Or maybe even better:

WITH
partner_date (partner_id, date_created) AS (
    SELECT  1,
            generate_series('2019-04-01'::date, '2019-04-30'::date, '1 day'::interval)::date
), 
all_data AS (
    SELECT o.order_id, o.date_created, o.recipient, o.phone, o.state_code, o.state_date
    FROM   orders o
    JOIN   partner_date pd
        ON (o.partner_id, o.date_created) = (pd.partner_id, pd.date_created)
)
SELECT *
FROM   all_data
ORDER BY order_id ASC LIMIT 10 OFFSET 0;

Disclaimer - I can't explain why the first query should be interpreted in other way by Postgres planner, just think it could. On the other hand, second query separates offsets/limits from joins and I'd be very surprised if Postgres still did it the "bad" (according to you benchmarks) way.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.