postgres - slow outer query when using a primary key

Question

Using Postgres v. 8.4.2.

I have a standard three-table setup for a many-to-many relationship. A query, written against two of these tables, is supposed to return a single record for which there is no existing mapping:

SELECT b.id
   FROM scm_branch b
      LEFT OUTER JOIN "scm_branchgroup_branches" bgb
         ON b.id = bgb.branch_id
   WHERE
         bgb.branch_id is NULL
   LIMIT 1

(scm_branchgroup_branches is the ternary table keeping the mapping records)

this query has the following top level cost:

Limit (cost=0.00..0.22 rows=1 width=4)

However if I change the field used in the WHERE condition to be the primary key of the bgb table, the cost goes drastically up:

SELECT b.id
   FROM scm_branch b
      LEFT OUTER JOIN "scm_branchgroup_branches" bgb
         ON b.id = bgb.branch_id
   WHERE
         bgb.id IS NULL 
   LIMIT 1

(note: this query uses "bgb.id IS NULL" vs "bgb.branch_id is NULL" used in the previous example)

Top level cost:

Limit (cost=236366.74..4644900.75 rows=1 width=4)

Why such a drastic difference? I mean I see the difference in execution plans, but I don't understand the underlying reasons for the difference.

I'll be happy to provide any additional info needed to answer this question.

Thanks D.

ruakh · Accepted Answer · 2011-11-19 00:14:14Z

1

In the version with branch_id IS NULL, PostgreSQL recognizes that you're doing an anti-join, and it optimizes accordingly, using a "hash-join"-type algorithm. (Google PostgreSQL hash left anti-join for lots of information on this optimization.) In the version with id IS NULL, however, it doesn't recognize this fact, and its optimizations are not nearly so helpful. What's more, the join basically forces it to use the index on branch_id, so it can't take advantage of the primary-key index on id, and must consult the actual table data to determine if a given record satisfies the WHERE-clause.

(There may well be other factors at play as well — I'm not an expert on PostgreSQL — but I believe those are the main factors.)

answered Nov 19, 2011 at 0:14

ruakh

185k29 gold badges292 silver badges324 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

dMb Over a year ago

Ah, makes sense. Thanks for pointing me in the right direction. The slow query is issued by Django ORM. I don't think there is built-in anti-join support in it, but at least now I can ask the right questions.

Collectives™ on Stack Overflow

postgres - slow outer query when using a primary key

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related