1

I am trying to figure out why a specific SQL query is running slowly on postgres. Without too much explaining, here are an EXPLAIN ANALYZE of the query:

explain analyze SELECT "addressbooks_address"."id", "addressbooks_address"."name" FROM "addressbooks_recipientaddress" INNER JOIN"addressbooks_address" ON ("addressbooks_recipientaddress"."address_ptr_id" = "addressbooks_address"."id") ORDER BY "addressbooks_recipientaddress"."address_ptr_id" ASC LIMIT 1000 OFFSET 378000;

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=40139.55..40245.42 rows=1000 width=21) (actual time=720.444..721.958 rows=1000 loops=1)
   ->  Merge Join  (cost=121.88..67152.43 rows=633159 width=21) (actual time=0.028..698.069 rows=379000 loops=1)
     Merge Cond: (addressbooks_recipientaddress.address_ptr_id = addressbooks_address.id)
     ->  Index Scan using addressbooks_recipientaddress_pkey on addressbooks_recipientaddress  (cost=0.00..19258.72 rows=633159 width=4) (actual time=0.012..189.480 rows=379000 loops=1)
     ->  Index Scan using addressbooks_address_pkey on addressbooks_address  (cost=0.00..38291.65 rows=675743 width=17) (actual time=0.011..227.094 rows=388264 loops=1)
Total runtime: 722.092 ms

The query is generated by django, but i have simplified it a bit before posting it here. But the issue still remains. I have indexes for both addressbooks_address.id and addressbooks_recipientaddress.address_ptr_id as shown by explain.

Any ideas?

3
  • 3
    well your query have to join 379000 rows, I don't think this is very slow, it completes in less than 1 second Commented Sep 6, 2013 at 5:51
  • is your index on addressbooks_address (id) unique? Commented Sep 6, 2013 at 5:52
  • Define "slow". As Roman Pekar mentioned you are joining lots of rows plus you are doing sorting. Commented Sep 6, 2013 at 7:01

1 Answer 1

2

LIMIT 1000 OFFSET 378000

Looks fast for what you're doing; you're generating a fairly large join, then throwing the vast majority of it away.

Instead of using an OFFSET, try doing your pagination by the primary key of the rows of interest if possible. Remember the addressbooks_address.id and whatever the key of addressbooks_recipientaddress is from the last tuple in the prior result and use a WHERE clause like:

WHERE "addressbooks_recipientaddress"."id" > $1
  AND "addressbooks_address"."id" > $2

instead of the OFFSET. That way your index scan can just skip to those records, instead of wasting all that time generating results to throw away.

If your framework can't do that, well, that's why I don't like query generator frameworks.

Sign up to request clarification or add additional context in comments.

1 Comment

Doing the join along with the offset is the issue here, as you said. I realized that addressbook_recipientaddress is really just a table containing a foreign key to the addresstable. So I have tried to optimize the model which the query is generated by so the join is completely gone and the offset is done directly in the original query. I'll get back when I see if its working as I expect it will.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.