Postgresql slow join even with index?

Question

I am trying to figure out why a specific SQL query is running slowly on postgres. Without too much explaining, here are an EXPLAIN ANALYZE of the query:

explain analyze SELECT "addressbooks_address"."id", "addressbooks_address"."name" FROM "addressbooks_recipientaddress" INNER JOIN"addressbooks_address" ON ("addressbooks_recipientaddress"."address_ptr_id" = "addressbooks_address"."id") ORDER BY "addressbooks_recipientaddress"."address_ptr_id" ASC LIMIT 1000 OFFSET 378000;

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=40139.55..40245.42 rows=1000 width=21) (actual time=720.444..721.958 rows=1000 loops=1)
   ->  Merge Join  (cost=121.88..67152.43 rows=633159 width=21) (actual time=0.028..698.069 rows=379000 loops=1)
     Merge Cond: (addressbooks_recipientaddress.address_ptr_id = addressbooks_address.id)
     ->  Index Scan using addressbooks_recipientaddress_pkey on addressbooks_recipientaddress  (cost=0.00..19258.72 rows=633159 width=4) (actual time=0.012..189.480 rows=379000 loops=1)
     ->  Index Scan using addressbooks_address_pkey on addressbooks_address  (cost=0.00..38291.65 rows=675743 width=17) (actual time=0.011..227.094 rows=388264 loops=1)
Total runtime: 722.092 ms

The query is generated by django, but i have simplified it a bit before posting it here. But the issue still remains. I have indexes for both addressbooks_address.id and addressbooks_recipientaddress.address_ptr_id as shown by explain.

Any ideas?

well your query have to join 379000 rows, I don't think this is very slow, it completes in less than 1 second — roman
– roman, Commented Sep 6, 2013 at 5:51
Define "slow". As Roman Pekar mentioned you are joining lots of rows plus you are doing sorting. — freakish
– freakish, Commented Sep 6, 2013 at 7:01

Craig Ringer · Accepted Answer · 2013-09-06 07:06:45Z

2

LIMIT 1000 OFFSET 378000

Looks fast for what you're doing; you're generating a fairly large join, then throwing the vast majority of it away.

Instead of using an OFFSET, try doing your pagination by the primary key of the rows of interest if possible. Remember the addressbooks_address.id and whatever the key of addressbooks_recipientaddress is from the last tuple in the prior result and use a WHERE clause like:

WHERE "addressbooks_recipientaddress"."id" > $1
  AND "addressbooks_address"."id" > $2

instead of the OFFSET. That way your index scan can just skip to those records, instead of wasting all that time generating results to throw away.

If your framework can't do that, well, that's why I don't like query generator frameworks.

answered Sep 6, 2013 at 7:06

Craig Ringer

329k84 gold badges742 silver badges820 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

aneez Over a year ago

Doing the join along with the offset is the issue here, as you said. I realized that addressbook_recipientaddress is really just a table containing a foreign key to the addresstable. So I have tried to optimize the model which the query is generated by so the join is completely gone and the offset is done directly in the original query. I'll get back when I see if its working as I expect it will.

Collectives™ on Stack Overflow

Postgresql slow join even with index?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related