So I have asked about this query before and I got a really insightful answer. However I'd like to be able to further segment this query on Postgresql 9.6.3 and it starts to slow down again. I'm not sure that a partial index will help here because it's not from a boolean value.
So this is the base query that is performing very well:
EXPLAIN ANALYZE
SELECT posts.*
FROM unnest('{17858,50909,52659,50914,50916,51696,52661,52035,17860,53315,54027,53305}'::int []) s(source_id),
LATERAL
(SELECT "posts".*
FROM "posts"
WHERE (source_id = s.source_id)
AND ("posts"."deleted_at" IS NOT NULL)
AND "posts"."rejected_at" IS NULL
ORDER BY posts.external_created_at DESC
LIMIT 100) posts
ORDER BY posts.external_created_at DESC
LIMIT 100
OFFSET 1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=30895.79..30896.04 rows=100 width=1043) (actual time=5.299..5.337 rows=100 loops=1)
-> Sort (cost=30895.78..30920.78 rows=10000 width=1043) (actual time=5.297..5.325 rows=101 loops=1)
Sort Key: posts.external_created_at DESC
Sort Method: top-N heapsort Memory: 110kB
-> Nested Loop (cost=0.56..30512.87 rows=10000 width=1043) (actual time=0.085..4.077 rows=738 loops=1)
-> Function Scan on unnest s (cost=0.00..1.00 rows=100 width=4) (actual time=0.011..0.016 rows=12 loops=1)
-> Limit (cost=0.56..303.12 rows=100 width=1043) (actual time=0.018..0.298 rows=62 loops=12)
-> Index Scan using index_posts_for_moderation_queue on posts (cost=0.56..7628.00 rows=2521 width=1043) (actual time=0.017..0.285 rows=62 loops=12)
Index Cond: (source_id = s.source_id)
Planning time: 0.443 ms
Execution time: 5.433 ms
(11 rows)
And this is the modified one, with filter, that is much slower:
EXPLAIN ANALYZE
SELECT posts.*
FROM unnest('{17858,50909,52659,50914,50916,51696,52661,52035,17860,53315,54027,53305}'::int []) s(source_id),
LATERAL
(SELECT "posts".*
FROM "posts"
WHERE (source_id = s.source_id)
AND ("posts"."deleted_at" IS NOT NULL)
AND "posts"."deleted_by" = 'User'
AND "posts"."rejected_at" IS NULL
ORDER BY posts.external_created_at DESC
LIMIT 100) posts
ORDER BY posts.external_created_at DESC
LIMIT 100
OFFSET 0;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=551390.03..551390.28 rows=100 width=1043) (actual time=769.522..769.522 rows=0 loops=1)
-> Sort (cost=551390.03..551391.78 rows=700 width=1043) (actual time=769.521..769.521 rows=0 loops=1)
Sort Key: posts.external_created_at DESC
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=5513.47..551363.28 rows=700 width=1043) (actual time=769.508..769.508 rows=0 loops=1)
-> Function Scan on unnest s (cost=0.00..1.00 rows=100 width=4) (actual time=0.012..0.022 rows=12 loops=1)
-> Limit (cost=5513.47..5513.48 rows=7 width=1043) (actual time=64.122..64.122 rows=0 loops=12)
-> Sort (cost=5513.47..5513.48 rows=7 width=1043) (actual time=64.120..64.120 rows=0 loops=12)
Sort Key: posts.external_created_at DESC
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on posts (cost=5485.28..5513.37 rows=7 width=1043) (actual time=64.104..64.104 rows=0 loops=12)
Recheck Cond: ((source_id = s.source_id) AND (deleted_at IS NOT NULL) AND (rejected_at IS NULL) AND ((deleted_by)::text = 'User'::text))
Rows Removed by Index Recheck: 1
Heap Blocks: exact=9
-> BitmapAnd (cost=5485.28..5485.28 rows=7 width=0) (actual time=64.098..64.098 rows=0 loops=12)
-> Bitmap Index Scan on index_posts_for_moderation_queue (cost=0.00..59.47 rows=2521 width=0) (actual time=0.028..0.028 rows=168 loops=12)
Index Cond: (source_id = s.source_id)
-> Bitmap Index Scan on index_posts_on_deleted_by (cost=0.00..5425.55 rows=291865 width=0) (actual time=76.855..76.855 rows=334200 loops=10)
Index Cond: ((deleted_by)::text = 'User'::text)
Planning time: 0.348 ms
Execution time: 769.660 ms
(21 rows)
The only difference between the two is that the second as the AND "posts"."deleted_by" = 'User' part additionally added to the lateral query.
The problem is where the value 'User' is, that is not a boolean value, and can be anything.
Is there a way to further optimize this query so that it will be speedier, even with the deleted_by query set?
Here's the db structure and indexes and settings:
CREATE TABLE posts (
id integer NOT NULL,
source_id integer,
message text,
image text,
external_id text,
created_at timestamp without time zone,
updated_at timestamp without time zone,
external text,
like_count integer DEFAULT 0 NOT NULL,
comment_count integer DEFAULT 0 NOT NULL,
external_created_at timestamp without time zone,
deleted_at timestamp without time zone,
poster_name character varying(255),
poster_image text,
poster_url character varying(255),
poster_id text,
position integer,
location character varying(255),
description text,
video text,
rejected_at timestamp without time zone,
deleted_by character varying(255),
height integer,
width integer
);
CREATE INDEX index_posts_on_source_id_and_external_created_at ON posts USING btree (source_id, external_created_at DESC) WHERE deleted_at IS NOT NULL AND rejected_at IS NULL;
CREATE INDEX index_posts_on_deleted_at ON posts USING btree (deleted_at);
CREATE INDEX index_posts_on_deleted_by ON posts USING btree (deleted_by);
CREATE INDEX index_posts_on_source_id ON posts USING btree (source_id);
The first of the indexes above being a result of the answer I got to my last question.
Postgres memory settings:
name, setting, unit
'default_statistics_target','100',''
'effective_cache_size','16384','8kB'
'maintenance_work_mem','16384','kB'
'max_connections','100',''
'random_page_cost','4',NULL
'seq_page_cost','1',NULL
'shared_buffers','16384','8kB'
'work_mem','1024','kB'
Database stats:
Total Posts: 20,997,027
Posts where deleted_at is null: 15,665,487
Distinct source_id's: 22,245
Max number of rows per single source_id: 1,543,950
Min number of rows per single source_id: 1
Most source_ids in a single query: 21
Distinct external_created_at: 11,146,151
EDIT
I tried out the simplified answer I got from Evan with different source ids and it's pretty slow:
EXPLAIN ANALYZE
SELECT *
FROM posts AS p
WHERE source_id IN (159469,120669,120668,120670,120671,120674,120662,120661,120664,109450,109448,109447,108039,159468,157810)
AND deleted_at IS NOT NULL
AND deleted_by = 'Filter'
AND rejected_at IS NULL
ORDER BY external_created_at DESC
LIMIT 100;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=74114.14..74114.19 rows=100 width=1060) (actual time=2794.981..2794.981 rows=0 loops=1)
-> Sort (cost=74114.14..74115.48 rows=2678 width=1060) (actual time=2794.981..2794.981 rows=0 loops=1)
Sort Key: external_created_at DESC
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on posts p (cost=68759.42..74093.67 rows=2678 width=1060) (actual time=2794.977..2794.977 rows=0 loops=1)
Recheck Cond: ((source_id = ANY ('{159469,120669,120668,120670,120671,120674,120662,120661,120664,109450,109448,109447,108039,159468,157810}'::integer[])) AND (deleted_at IS NOT NULL) AND (rejected_at IS NULL) AND ((deleted_by)::text = 'Filter'::text))
Rows Removed by Index Recheck: 32326
Heap Blocks: exact=16019
-> BitmapAnd (cost=68759.42..68759.42 rows=2678 width=0) (actual time=2745.376..2745.376 rows=0 loops=1)
-> Bitmap Index Scan on index_posts_for_moderation_queue (cost=0.00..830.64 rows=52637 width=0) (actual time=42.319..42.319 rows=272192 loops=1)
Index Cond: (source_id = ANY ('{159469,120669,120668,120670,120671,120674,120662,120661,120664,109450,109448,109447,108039,159468,157810}'::integer[]))
-> Bitmap Index Scan on index_posts_on_deleted_by (cost=0.00..67928.46 rows=6942897 width=0) (actual time=2651.123..2651.123 rows=7863994 loops=1)
Index Cond: ((deleted_by)::text = 'Filter'::text)
Planning time: 0.856 ms
Execution time: 2795.033 ms
(15 rows)
The reason I am using LATERAL can be explained by another earlier question I made optimizing this query.
CREATE INDEX index_posts_on_source_id_and_external_created_at ON posts USING btree (source_id, external_created_at DESC, deleted_by) WHERE deleted_at IS NOT NULL AND rejected_at IS NULL;deleted_by?