0

I have two tables master and a child table. So for the question I will refer master as m and child table as c.

Master table properties:

Records: 50 million
primary key (index): m_id 
btree index: modified_id

Child table properties (1-n relationship):

Records: 400 million
primary key (index): c_id
foreign key (btree index): m_id

Query plan

Gather  (cost=9159.80..6768939.59 rows=18940 width=107) (actual time=137160.885..297009.782 rows=25 loops=1)
  Output: m.<date_column>, m.m_id, m.other_unique_id_1, m.modified_id, c.<date_column>, c.c_id, c.m_id, c.<other_column_1>, c.<other_column_2>, c.<other_column_3>, c.<other_column_4>, c.<other_column_5>, c.<other_column_6>, c.<other_column_7>, c.<other_column_8>
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=11266 read=4680890
  ->  Hash Join  (cost=8159.80..6766045.59 rows=7892 width=107) (actual time=87154.268..297003.331 rows=8 loops=3)
        Output: c.<date_column>, c.m_id, c.other_unique_id_1, c.modified_id, c.<date_column>, c.c_id, c.m_id, c.<other_column_1>, c.<other_column_2>, c.<other_column_3>, c.<other_column_4>, c.<other_column_5>, c.<other_column_6>, c.<other_column_7>, c.<other_column_8>
        Inner Unique: true
        Hash Cond: (c.m_id = c.m_id)
        Buffers: shared hit=11266 read=4680890
        Worker 0: actual time=82162.730..297002.014 rows=13 loops=1
          Buffers: shared hit=3789 read=1555017
        Worker 1: actual time=42139.478..297002.514 rows=8 loops=1
          Buffers: shared hit=3634 read=1569261
        ->  Parallel Seq Scan on child c  (cost=0.00..6328357.20 rows=163629920 width=87) (actual time=0.638..279084.058 rows=130858840 loops=3)
              Output: c.<date_column>, c.c_id, c.m_id, c.<other_column_1>, c.<other_column_2>, c.<other_column_3>, c.<other_column_4>, c.<other_column_5>, c.<other_column_6>, c.<other_column_7>, c.<other_column_8>
              Buffers: shared hit=11171 read=4680887
              Worker 0: actual time=0.814..279170.920 rows=130423418 loops=1
                Buffers: shared hit=3746 read=1555017
              Worker 1: actual time=1.095..278955.836 rows=131603000 loops=1
                Buffers: shared hit=3592 read=1569260
        ->  Hash  (cost=8132.89..8132.89 rows=2153 width=20) (actual time=0.046..0.047 rows=8 loops=3)
              Output: c.<date_column>, c.m_id, c.other_unique_id_1, c.modified_id
              Buckets: 4096  Batches: 1  Memory Usage: 33kB
              Buffers: shared hit=35 read=3
              Worker 0: actual time=0.026..0.027 rows=8 loops=1
                Buffers: shared hit=13
              Worker 1: actual time=0.059..0.060 rows=8 loops=1
                Buffers: shared hit=12 read=1
              ->  Bitmap Heap Scan on master c  (cost=41.25..8132.89 rows=2153 width=20) (actual time=0.031..0.041 rows=8 loops=3)
                    Output: c.<date_column>, c.m_id, c.other_unique_id_1, c.modified_id
                    Recheck Cond: (c.modified_id = 561869)
                    Heap Blocks: exact=8
                    Buffers: shared hit=35 read=3
                    Worker 0: actual time=0.018..0.024 rows=8 loops=1
                      Buffers: shared hit=13
                    Worker 1: actual time=0.043..0.055 rows=8 loops=1
                      Buffers: shared hit=12 read=1
                    ->  Bitmap Index Scan on ix_master_modified_id  (cost=0.00..40.71 rows=2153 width=0) (actual time=0.027..0.027 rows=8 loops=3)
                          Index Cond: (c.modified_id = 561869)
                          Buffers: shared hit=11 read=3
                          Worker 0: actual time=0.015..0.015 rows=8 loops=1
                            Buffers: shared hit=5
                          Worker 1: actual time=0.038..0.038 rows=8 loops=1
                            Buffers: shared hit=4 read=1
Planning time: 0.354 ms
Execution time: 297009.825 ms

Query

select
    *
from
    master m
inner join child c on
    m.m_id = c.m_id
where
    m.modified_id = <xyz>

DDL statement for Master table

CREATE TABLE master (
    <date_column> timestamp NULL DEFAULT CURRENT_DATE,
    m_id serial4 NOT NULL,
    <other_unique_id_1> int4 NULL,
    modified_id int4 NULL,
    CONSTRAINT master_pkey PRIMARY KEY (m_id),
    CONSTRAINT <foreign key> FOREIGN KEY (<other_unique_id_1>) REFERENCES <other table>(<other_unique_id_1>)
);
CREATE INDEX ix_master_modified_id ON master USING btree (modified_id);
CREATE UNIQUE INDEX ix_master_other_unique_id_1 ON master USING btree (other_unique_id_1);

DDL for child table

CREATE TABLE child (
    <date column> timestamp NULL DEFAULT now(),
    c_id serial4 NOT NULL,
    m_id int4 NULL,
    <other_column_1> varchar(50) NULL,
    <other_column_2> varchar(50) NULL,
    <other_column_3> bool NULL,
    <other_column_4> varchar(50) NULL,
    <other_column_5> varchar(2) NULL,
    <other_column_6> varchar(50) NULL,
    <other_column_7> varchar(10) NULL,
    <other_column_8> text NULL,
    CONSTRAINT child_pkey PRIMARY KEY (c_id),
    CONSTRAINT child_master_fkey FOREIGN KEY (m_id) REFERENCES master(m_id)
);
CREATE INDEX ix_child_m_id ON child USING btree (m_id);
18
  • 1
    Can you also add the query and the create table / index statements? Commented Dec 10, 2021 at 11:53
  • Please edit your question and add the query you are using. The output of explain (analyze, verbose, buffers, timing) would also be interesting. Commented Dec 10, 2021 at 12:10
  • let me do that too. And I cannot add analyze in explain cause it will take hell lot of time @a_horse_with_no_name if you want it anyways then I can fire it Commented Dec 10, 2021 at 13:08
  • 1
    @AnandTripathi you're right... I haven't noticed it was a PK :) Commented Dec 10, 2021 at 14:22
  • 1
    @AnandTripathi the plan suggests that the child node returns over 130m rows, which is a significant part of the table.. probably that's why the planner decided to skip the index. Commented Dec 10, 2021 at 14:46

1 Answer 1

3

I think the root of the problem is this gross misestimate.

->  Bitmap Index Scan on ix_master_modified_id  (cost=0.00..40.71 rows=2153 width=0) (actual time=0.027..0.027 rows=8 loops=3)

You said you already did VACUUM ANALYZE the table. In that case you might need to increase the stats size, and then ANALYZE again, to get a better estimate. You could either change the default_statistics_target globally, or target this one column with:

alter table master alter modified_id set statistics 10000;
analyze master;

You might not need to increase it all the way to 10000 (the max allowed), but if you only do it for one column I see no reason trying to fine tune it. Just break out the big guns right away. If it works, then you can worry about fine tuning it.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks I increased the child m_id statistics that helped me in getting the index in explain plan

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.