0

I'm trying to run a query to select customer audience, but it should select the customers who didn't get an email before. The email tracking comes from another table. This is the original query:

SELECT 
  c.customers_firstname, 
  c.customers_lastname, 
  o.orders_id, 
  o.customers_id, 
  c.customers_email_address
FROM 
  orders o, 
  customers c,  
  order_status s
WHERE 
  o.customers_id = c.customers_id
  AND o.orders_id = s.orders_id
  AND o.orders_status = s.orders_status_id 
ORDER BY 
  o.orders_id ASC

Now, I need to check another table called tracking and see if the customer already exists in that table and if so, skip it.

This is what I've tried, but it doesn't seem to work:

SELECT 
   c.customers_firstname, 
   c.customers_lastname, 
   o.orders_id, 
   o.customers_id, 
   c.customers_email_address
FROM 
   orders o, 
   customers c 
INNER JOIN 
   tracking t 
ON 
   c.customers_id = t.customers_id,  
   order_status s
WHERE 
   o.customers_id = c.customers_id
   AND o.orders_id = s.orders_id
   AND o.orders_status = s.orders_status_id
   AND c.customers_id NOT LIKE t.customers_id
ORDER BY 
   o.orders_id ASC

What am I doing wrong? Or is there any way to do this better?

ADDED: I totally forgot one more important factor - tracking table has "module" column and I need results only from "contact" module. So, in other words, I need to filter out customers who already exist in the tracking table, but only if associated with contact module, not any other module.

1
  • 1
    Don't mix implicit and explicit joins. Commented Dec 18, 2012 at 23:13

2 Answers 2

1

This is equivalent to your original query:

SELECT c.customers_firstname
     , c.customers_lastname
     , o.orders_id
     , o.customers_id
     , c.customers_email_address
  FROM orders o
  JOIN customers c 
    ON c.customers_id = o.customers_id
  JOIN order_status s 
    ON s.orders_id = o.orders_id
   AND s.orders_status_id = o.orders_status
 ORDER
    BY o.orders_id ASC

Add an anti-join

To meet your specification, you can use an "anti-join" pattern. We can add this to the query, before the ORDER BY clause:

  LEFT
  JOIN tracking t
    ON t.customers_id = o.customers_id
 WHERE t.customers_id IS NULL   

What that's going to do is find all matching rows from the tracking table, based on the customers_id. For any rows that the query doesn't find a matching row(s) in the tracking table, it will generate a dummy row from tracking which consists of all NULL values. (That's one way of describing what an OUTER JOIN does.)

The "trick" now is to throw out all the rows that matched. And we do that by checking for a NULL value of customers_id from the tracking table (in the WHERE clause). For a match, that column won't be NULL. (The equals comparison in the join predicate guarantees us that.) So we know that if we get a NULL value for t.customers_id, that there wasn't a match.

So, this query returns the specified result set:

SELECT c.customers_firstname
     , c.customers_lastname
     , o.orders_id
     , o.customers_id
     , c.customers_email_address
  FROM orders o
  JOIN customers c 
    ON c.customers_id = o.customers_id
  JOIN order_status s 
    ON s.orders_id = o.orders_id
   AND s.orders_status_id = o.orders_status
  LEFT
  JOIN tracking t
    ON t.customers_id = o.customers_id
 WHERE t.customers_id IS NULL   
 ORDER
    BY o.orders_id ASC

Other approaches

There are other approaches, but the anti-join is frequently the best performer.

Some other options are a NOT EXISTS predicate and a NOT IN predicate. I can add those, though I expect those solutions will be provided in other answers before I get around to it.


Starting with that first query (equivalent to the query in your question), we could also use a NOT EXISTS predicate. We'd add this before the ORDER BY clause:

 WHERE NOT EXISTS 
       ( SELECT 1
           FROM tracking t
          WHERE t.customers_id = o.customers_id
       )

To use a NOT IN predicate, again, add this before the ORDER BY clause:

 WHERE o.customers_id NOT IN
       ( SELECT t.customers_id
           FROM tracking t
          WHERE t.customers_id IS NOT NULL
       )

(You may have some guarantee that tracking.customers_id is not null, but in the more general case, it's important that the subquery NOT return a NULL value, so we include a WHERE clause so that we have that guaranteed.)

With appropriate indexes, the anti-join pattern usually performs better than either the NOT EXISTS or the NOT IN, but not always.

Sign up to request clarification or add additional context in comments.

5 Comments

Well you went the extra mile so I removed my answer since it covered the same ground. You could include this link to explain extended which discusses how the query plans differ. You could also discuss why the original query didn't work
@Conrad Frix: I didn't mean to override your answer, I apologize about that. I just wanted to explain the anti-join pattern, as well as provide some alternatives. That link provides a pretty good write up about the differences.
no worries :) Your answer made mine superfluous and just became noise. Just because mine was a few minutes older doesn't mean it should stay.
@spencer7593 wow, impressive answer! Thank you for taking the time to write it! I'm pretty sure it's the correct answer, but it still doesn't work for me, obviously for the reason I just added in my original question. Can I please ask you to check what I've added and update your answer? I've tried adding {where t.module LIKE 'contact' and t.customers_id IS NULL} but this didn't work, it filters ALL customers because their IDs exist associated with other modules... THANK YOU!
@user1078494: understand that when you add a t.somecol IS NOT NULL predicate to the WHERE clause (which is what your t.module LIKE 'contact' does), that will negate the outer join. The generated dummy row will contain a NULL value for all columns from the t table, so your addition to the WHERE clause will cause the query to never return any rows. What you want is to add your AND t.module LIKE 'contact' predicate to the ON clause of the LEFT JOIN (add it on a line immediately before the WHERE clause.)
0

Like spencer7593 suggested you can do the antijoin but instead of

LEFT JOIN tracking t ON t.customers_id = o.customers_id
WHERE t.customers_id IS NULL   

you can write easier

JOIN tracking t ON t.customers_id = o.customers_id

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.