1

Using Postgres v. 8.4.2.

I have a standard three-table setup for a many-to-many relationship. A query, written against two of these tables, is supposed to return a single record for which there is no existing mapping:

SELECT b.id
   FROM scm_branch b
      LEFT OUTER JOIN "scm_branchgroup_branches" bgb
         ON b.id = bgb.branch_id
   WHERE
         bgb.branch_id is NULL
   LIMIT 1

(scm_branchgroup_branches is the ternary table keeping the mapping records)

this query has the following top level cost:

Limit (cost=0.00..0.22 rows=1 width=4)

However if I change the field used in the WHERE condition to be the primary key of the bgb table, the cost goes drastically up:

SELECT b.id
   FROM scm_branch b
      LEFT OUTER JOIN "scm_branchgroup_branches" bgb
         ON b.id = bgb.branch_id
   WHERE
         bgb.id IS NULL 
   LIMIT 1

(note: this query uses "bgb.id IS NULL" vs "bgb.branch_id is NULL" used in the previous example)

Top level cost:

Limit (cost=236366.74..4644900.75 rows=1 width=4)

Why such a drastic difference? I mean I see the difference in execution plans, but I don't understand the underlying reasons for the difference.

I'll be happy to provide any additional info needed to answer this question.

Thanks D.

1 Answer 1

1

In the version with branch_id IS NULL, PostgreSQL recognizes that you're doing an anti-join, and it optimizes accordingly, using a "hash-join"-type algorithm. (Google PostgreSQL hash left anti-join for lots of information on this optimization.) In the version with id IS NULL, however, it doesn't recognize this fact, and its optimizations are not nearly so helpful. What's more, the join basically forces it to use the index on branch_id, so it can't take advantage of the primary-key index on id, and must consult the actual table data to determine if a given record satisfies the WHERE-clause.

(There may well be other factors at play as well — I'm not an expert on PostgreSQL — but I believe those are the main factors.)

Sign up to request clarification or add additional context in comments.

1 Comment

Ah, makes sense. Thanks for pointing me in the right direction. The slow query is issued by Django ORM. I don't think there is built-in anti-join support in it, but at least now I can ask the right questions.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.