I'm just starting studying Postgres and I'm in a situation where, depending how I do the JOIN on my tables, the performance and plan output seems really strange.
Those are the table used with their indexes:
create table escola
(
pk_codigo integer not null
constraint pk_escola
primary key,
nome varchar(100),
municipio varchar(150),
uf char(2),
cod_municipio integer,
uf_id integer default 0 not null
constraint fk_escola_uf_id
references tb_uf
)
;
create index idx_escola_uf
on escola (uf)
;
create index idx_escola_uf_id
on escola (uf_id)
;
create index idx_multi_escola_uf_pk
on escola (uf, pk_codigo)
;
create table if not exists candidato
(
pk_numero_inscricao bigint not null
constraint candidato_pk
primary key,
cod_municipio_residencia integer,
municipio_residencia varchar(150),
uf_residencia char(2),
uf_nascimento char(2),
situacao_conclusao numeric(1),
ano_concluiu smallint,
idade smallint,
sexo char,
fk_codigo_escola integer
constraint fk_candidato_codigo_escola
references escola,
uf_prova char(2)
)
;
create index if not exists idx_candidato_codigo_escola
on candidato (fk_codigo_escola)
;
create table tb_uf
(
uf varchar(2),
pk_id serial not null
constraint tb_uf_pkey
primary key
)
;
create unique index tb_uf_uf_uindex
on tb_uf (uf)
;
create unique index tb_uf_pk_id_uindex
on tb_uf (pk_id)
;
And the queries (with plans):
EXPLAIN ANALYZE
SELECT pk_numero_inscricao, pk_codigo
FROM escola e
JOIN candidato c
ON c.fk_codigo_escola = e.pk_codigo
WHERE e.uf = 'RJ'
;
Time without EXPLAIN ANALYZE: 916ms
Plan: https://explain.depesz.com/s/M6B
EXPLAIN ANALYZE
SELECT pk_numero_inscricao, pk_codigo
FROM escola AS e
JOIN candidato AS c
ON c.fk_codigo_escola = e.pk_codigo
JOIN tb_uf AS u
ON e.uf_id = u.pk_id
WHERE u.uf = 'RJ'
;
Time without EXPLAIN ANALYZE: 72ms
Plan: https://explain.depesz.com/s/E3MR
EXPLAIN ANALYZE
SELECT pk_numero_inscricao, pk_codigo
FROM escola AS e
JOIN candidato AS c
ON c.fk_codigo_escola = e.pk_codigo
WHERE e.uf_id = 19
;
Time without EXPLAIN ANALYZE: 961ms
Plan: https://explain.depesz.com/s/v67V
The weird thing happening for me is that queries 1 and 3 are slower than query 2, although query 2 has an extra join. Does anybody know what might be causing this?
I noticed that Index Scan on table candidato is a lot slower on queries 1 and 3 also, and that makes no sense for me, since the final result is the same.
Another point is that EXPLAIN ANALYZE is adding a lot of overhead into the queries.
Thanks in advance! If I need to provide any more information, I can edit this post if needed!