I have a pretty simple table
CREATE TABLE approved_posts (
project_id INTEGER,
feed_id INTEGER,
post_id INTEGER,
approved_time TIMESTAMP NOT NULL,
post_time TIMESTAMP NOT NULL,
PRIMARY KEY (project_id, feed_id, post_id)
)
And I'm trying to optimize this query:
SELECT *
FROM approved_posts
WHERE feed_id IN (?, ?, ?)
AND project_id = ?
ORDER BY approved_time DESC, post_time DESC
LIMIT 1;
The query optimizer is fetching every single approved_post that matches the predicate, sorting all 100k results, and returning the top one it finds.
I do have an index on project_id, feed_id, approved_time, post_time, which it will use if I either:
A. remove the sort by post_time, or
B. replace the IN (?, ?, ?) with a single = ?.
Then it simply does a reverse index scan to get the first result and its blazingly fast.
Option A:
Limit (cost=0.43..6.57 rows=1 width=24) (actual time=0.101..0.101 rows=1 loops=1)
-> Index Scan Backward using approved_posts_approved_time_idx on approved_posts p (cost=0.43..840483.02 rows=136940 width=24) (actual time=0.100..0.100 rows=1 loops=1)
Filter: (feed_id = ANY ('{73321,73771,73772,73773,73774}'::integer[]))
Rows Removed by Filter: 37
Total runtime: 0.129 ms
Option B:
Limit (cost=0.43..3.31 rows=1 width=24) (actual time=0.065..0.065 rows=1 loops=1)
-> Index Scan Backward using approved_posts_full_pagination_index on approved_posts p (cost=0.43..126884.70 rows=44049 width=24) (actual time=0.063..0.063 rows=1 loops=1)
Index Cond: ((project_id = 148772) AND (feed_id = 73321))
Total runtime: 0.092 ms
But without these tweaks it is not so performant ...
Limit (cost=169792.16..169792.17 rows=1 width=24) (actual time=510.225..510.225 rows=1 loops=1)
-> Sort (cost=169792.16..170118.06 rows=130357 width=24) (actual time=510.224..510.224 rows=1 loops=1)
Sort Key: approved_time, post_time
Sort Method: top-N heapsort Memory: 25kB
-> Bitmap Heap Scan on approved_posts p (cost=12324.41..169140.38 rows=130357 width=24) (actual time=362.210..469.387 rows=126260 loops=1)
Recheck Cond: (feed_id = ANY ('{73321,73771,73772,73773,73774}'::integer[]))
-> Bitmap Index Scan on approved_posts_feed_id_idx (cost=0.00..12291.82 rows=130357 width=0) (actual time=354.496..354.496 rows=126260 loops=1)
Index Cond: (feed_id = ANY ('{73321,73771,73772,73773,73774}'::integer[]))
Total runtime: 510.265 ms
I can even add a conditional index on these 5 feed ids and it will once again do the right thing.
My current best solution is to put every feed_id in its own query and do a massive UNION between them all. But this doesn't scale very well as I might want to select the top 500 from 30 feeds, pulling in 15k rows and sorting them for no good reason. Also managing offsets with this strategy is somewhat complex.
Does anybody know how I can do this IN clause with two sorts on my well-indexed data and get Postgres to do the right thing?
I'm using Postgres 9.3.3. Here are my indexes:
"approved_posts_project_id_feed_id_post_id_key" UNIQUE CONSTRAINT, btree (project_id, feed_id, post_id)
"approved_posts_approved_time_idx" btree (approved_time)
"approved_posts_feed_id_idx" btree (feed_id)
"approved_posts_full_pagination_index" btree (project_id, feed_id, approved_time, post_time)
"approved_posts_post_id_idx" btree (post_id)
"approved_posts_post_time_idx" btree (post_time)
"approved_posts_project_id_idx" btree (project_id)
None of the columns are nullable.
This table has 2m rows, split among 200 feed IDs and 19 project IDs.
These are the most common feed IDs:
feed_id | count
---------+--------
73607 | 558860
73837 | 354018
73832 | 220285
73836 | 172664
73321 | 118695
73819 | 95999
73821 | 75871
73056 | 65779
73070 | 54655
73827 | 43710
73079 | 36700
73574 | 36111
73055 | 25682
73072 | 22596
73589 | 19856
73953 | 15286
73159 | 13059
73839 | 8925
In terms of min/max/avg cardinality per feedid/projectid pairing, we have:
min | max | avg
-----+--------+-----------------------
1 | 559021 | 9427.9140271493212670
9.3.3begs the question: Why not at least 9.3.9 (if 9.4 is not an option)?. We always recommend that all users run the latest available minor release for whatever major version is in use.