I have two tables in my database (address and person_address). Address has a PK in address_id. person_address has a PK on (address_id, person_id, usage_code)
When joining this two tables through the address_id, my expectation is that the PK index is used on both cases. However, Postgres is adding sort and materialize steps to the plan, which slows down the execution of the query. I have tried dropping indexes (person_address had an index on address_id), analyzing stats, without success.
I will appreciate any help on how to isolate this situation since those queries run slower than expected on our production environment
This is the query:
select *
from person_addresses pa
join address a
on pa.address_id = a.address_id
This is the plan :
Merge Join (cost=1506935.96..2416648.39 rows=16033774 width=338)
Merge Cond: (pa.address_id = ((a.address_id)::numeric))
-> Index Scan using person_addresses_pkey on person_addresses pa (cost=0.43..592822.76 rows=5256374 width=104)
-> Materialize (cost=1506935.53..1526969.90 rows=4006874 width=234)
-> Sort (cost=1506935.53..1516952.71 rows=4006874 width=234)
Sort Key: ((a.address_id)::numeric)
-> Seq Scan on address a (cost=0.00..163604.74 rows=4006874 width=234)
Thanks.
Edit 1. After the comment checked the data types and found a discrepancy. Fixing the data type changed the plan to the following
Hash Join (cost=343467.18..881125.47 rows=5256374 width=348)
Hash Cond: (pa.address_id = a.address_id)
-> Seq Scan on person_addresses pa (cost=0.00..147477.74 rows=5256374 width=104)
-> Hash (cost=159113.97..159113.97 rows=4033697 width=244)
-> Seq Scan on address_normalization a (cost=0.00..159113.97 rows=4033697 width=244)
Performance improvement is evident on the plan, but am wondering if the sequential scans are expected without any filters
Hash Join (cost=343467.18..881125.47 rows=5256374 width=348) Hash Cond: (pa.address_id = a.address_id) -> Seq Scan on person_addresses pa (cost=0.00..147477.74 rows=5256374 width=104) -> Hash (cost=159113.97..159113.97 rows=4033697 width=244) -> Seq Scan on address_normalization a (cost=0.00..159113.97 rows=4033697 width=244)Is a seq scan normal on these cases? Plan is more efficient it seemsMerge Joinwas chose, was because the two data types wouldn't result in the same hash values and thus a Hash Join wasn't possible.