Optimize subquery with max postgresql

Question

I'm using PostgreSQL 12.8 and I have two tables:

CREATE TABLE s (
    id     BIGINT PRIMARY KEY,
    type   VARCHAR,
    active BOOLEAN
);

With more than 230k rows, and

CREATE TABLE s_aud (
    id            BIGINT NOT NULL,
    type          VARCHAR,
    revision_id   INT4 NOT NULL,
    revision_type SMALLINT NOT NULL,
    CONSTRAINT s_aud_pk PRIMARY KEY (id, revision_id)
);

that contains more than 4M rows and is an apprend-only table where we store every add, update or delete operation done in s table, s_aud table does not contain any FK to s table. The problem is that I would like to execute the following query:

SELECT s.*, a.revision_id 
FROM s
JOIN (
  SELECT id, MAX(revision_id) AS revision_id 
  FROM s_aud 
  WHERE revision_type <> 2 AND type = 'X_TYPE'
  GROUP BY id
) a ON s.id = a.id 
WHERE s.type = 'X_TYPE' AND s.active = true;

Which is something like, get the latest revision_id of revision_type 2 of every id in the table.

If I execute the query, it takes more than 10 minutes to execute which is not acceptable, how can I improve it? I tried to add an index to:

CREATE INDEX s_aud_id_idx ON s_aud (id);
CREATE INDEX s_aud_revision_type_idx ON s_aud (revision_type);

But didn't affected to the performance of the query, any ideas?

EDIT, EXPLAIN WITH (ANALYZE, VERBOSE, BUFFERS, format text)

QUERY PLAN                                                                                                                                                                                                                               
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop  (cost=138478.70..149103.60 rows=1 width=238) (actual time=432.466..327417.023 rows=744 loops=1)                                                                                                                             
  Output: s.id, s.type, s.active, (max(s_aud.revision_id))
  Inner Unique: true                                                                                                                                                                                                                     
  Join Filter: (s.id = s_aud.id)                                                                                                                                                                                                   
  Rows Removed by Join Filter: 276396                                                                                                                                                                                                    
  Buffers: shared hit=77360594 read=7600                                                                                                                                                                                                 
  I/O Timings: read=22.744                                                                                                                                                                                                               
  ->  Gather  (cost=1000.00..10021.67 rows=1 width=234) (actual time=0.296..1.279 rows=744 loops=1)                                                                                                                                      
        Output: s.id, s.type, s.active           
        Workers Planned: 2                                                                                                                                                                                                               
        Workers Launched: 2                                                                                                                                                                                                              
        Buffers: shared hit=218 read=7600                                                                                                                                                                                                
        I/O Timings: read=22.744                                                                                                                                                                                                         
        ->  Parallel Seq Scan on db_schema.s (cost=0.00..9021.57 rows=1 width=234) (actual time=0.077..23.769 rows=248 loops=3)                                                                                      
              Output: s.id, s.type, s.active     
              Filter: (s.active AND ((s.type)::text = 'X_TYPE'::text))                                                                                                                                                            
              Rows Removed by Filter: 76643                                                                                                                                                                                              
              Buffers: shared hit=218 read=7600                                                                                                                                                                                          
              I/O Timings: read=22.744                                                                                                                                                                                                   
              Worker 0: actual time=0.132..36.318 rows=463 loops=1                                                                                                                                                                       
                Buffers: shared hit=217 read=3811                                                                                                                                                                                        
                I/O Timings: read=11.326                                                                                                                                                                                                 
              Worker 1: actual time=0.016..34.903 rows=280 loops=1                                                                                                                                                                       
                Buffers: shared read=3785                                                                                                                                                                                                
                I/O Timings: read=11.401                                                                                                                                                                                                 
  ->  Finalize GroupAggregate  (cost=137478.70..138951.16 rows=5812 width=12) (actual time=439.679..440.021 rows=372 loops=744)                                                                                                          
        Output: s_aud.id, max(s_aud.revision_id)                                                                                                                                                                             
        Group Key: s_aud.id                                                                                                                                                                                                        
        Buffers: shared hit=26138272                                                                                                                                                                                                     
        ->  Gather Merge  (cost=137478.70..138834.92 rows=11624 width=12) (actual time=439.672..439.859 rows=1111 loops=744)                                                                                                             
              Output: s_aud.id, (PARTIAL max(s_aud.revision_id))                                                                                                                                                             
              Workers Planned: 2                                                                                                                                                                                                         
              Workers Launched: 2                                                                                                                                                                                                        
              Buffers: shared hit=26138272                                                                                                                                                                                               
              ->  Sort  (cost=136478.67..136493.20 rows=5812 width=12) (actual time=435.151..435.183 rows=581 loops=2232)                                                                                                                
                    Output: s_aud.id, (PARTIAL max(s_aud.revision_id))                                                                                                                                                       
                    Sort Key: s_aud.id                                                                                                                                                                                             
                    Sort Method: quicksort  Memory: 59kB                                                                                                                                                                                 
                    Worker 0:  Sort Method: quicksort  Memory: 59kB                                                                                                                                                                      
                    Worker 1:  Sort Method: quicksort  Memory: 59kB                                                                                                                                                                      
                    Buffers: shared hit=77360376                                                                                                                                                                                         
                    Worker 0: actual time=433.575..433.613 rows=689 loops=744                                                                                                                                                            
                      Buffers: shared hit=25663619                                                                                                                                                                                       
                    Worker 1: actual time=434.099..434.136 rows=682 loops=744                                                                                                                                                            
                      Buffers: shared hit=25627461                                                                                                                                                                                       
                    ->  Partial HashAggregate  (cost=136057.16..136115.28 rows=5812 width=12) (actual time=434.849..434.962 rows=741 loops=2232)                                                                                         
                          Output: s_aud.id, PARTIAL max(s_aud.revision_id)                                                                                                                                                   
                          Group Key: s_aud.id                                                                                                                                                                                      
                          Buffers: shared hit=77348472                                                                                                                                                                                   
                          Worker 0: actual time=433.259..433.372 rows=740 loops=744                                                                                                                                                      
                            Buffers: shared hit=25657667                                                                                                                                                                                 
                          Worker 1: actual time=433.781..433.894 rows=740 loops=744                                                                                                                                                      
                            Buffers: shared hit=25621509                                                                                                                                                                                 
                          ->  Parallel Seq Scan on db_schema.s_aud (cost=0.00..129536.26 rows=1304180 width=12) (actual time=0.017..285.222 rows=1039458 loops=2232)                                                   
                                Output: s_aud.id, s_aud.revision_id                                                                                                                                                          
                                Filter: ((s_aud.revision_type <> 2) AND ((s_aud.type)::text = 'X_TYPE'::text))                                                                                                        
                                Rows Removed by Filter: 324437                                                                                                                                                                           
                                Buffers: shared hit=77348472                                                                                                                                                                             
                                Worker 0: actual time=0.007..283.757 rows=1034585 loops=744                                                                                                                                              
                                  Buffers: shared hit=25657667                                                                                                                                                                           
                                Worker 1: actual time=0.007..284.260 rows=1033185 loops=744                                                                                                                                              
                                  Buffers: shared hit=25621509                                                                                                                                                                           
Planning Time: 0.187 ms

Step 1: Get the results from EXPLAIN(ANALYZE, VERBOSE, BUFFERS). Without that information, it's gonna be hard to help you. — Frank Heikens
– Frank Heikens, Commented Mar 11, 2022 at 14:19
It will depend on your explain plan, but you can consider an index on s_aud.type. The best performance will be achieved by judicious partitioning of s_aud. — FlexYourData
– FlexYourData, Commented Mar 11, 2022 at 14:19
Please edit your question and add the execution plan generated using explain (analyze, buffers, format text) (not just a "simple" explain) as formatted text and make sure you preserve the indention of the plan. Paste the text, then put ``` on the line before the plan and on a line after the plan. — user330315
– user330315, Commented Mar 11, 2022 at 14:21

Thorsten Kettner · Accepted Answer · 2022-03-11 14:40:15Z

1

Your indexes are not appropriate. In table s you look for s.type = 'X_TYPE' AND s.active = true, but there is no index on the columns it seems. In table s_aud you want revision_type <> 2 AND type = 'X_TYPE', and there is only an index on revision_type.

Have composite indexes or even partial indexes instead. Use the latter if you aways look at the same values, e.g. always at type = 'X_TYPE';

Composite indexes:

create index idx1 on s (type, active, id);
create index idx1 on s_aud (type, revision_type, id, revision_id);

Partial indexes:

create index idx3 on s (id) where type = 'X_TYPE' AND active = true;
create index idx4 on s_aud (id, revision_id) where revision_type <> 2 AND type = 'X_TYPE';

Aa a last option you could even partition your tables by type or active status.

answered Mar 11, 2022 at 14:40

Thorsten Kettner

96.6k8 gold badges56 silver badges82 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Nico Over a year ago

I will try this and let you know, thanks

Nico Over a year ago

It works like a charm, I have a couple of questions, how can I know, in queries like this one, create the necessary indexes? You added id to the indexes, but I don't know why.

Thorsten Kettner Over a year ago

The where clause happens first. You are looking for s rows with a certain type and active value, so these come first in your index (or form the where clause of the partial index). Then you want to s.id, because this is used to join with the s_aud data. Hence the id comes next in the index.

Thorsten Kettner Over a year ago

As to s_aud: same thing: You want a particular type (first thing in the index, because you want an exact match) and there are certain revision_types you don't want (second column in the index). Once the matching rows are found they must get grouped by id, so the id must come next. At last you want the maximum revision_id. You could read it from the table, but why not get it right from the index? So we add this column, too.

Collectives™ on Stack Overflow

Optimize subquery with max postgresql

1 Answer 1

Composite indexes:

Partial indexes:

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Composite indexes:

Partial indexes:

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related