1

I have an ownership relation between two tables, say users(int user_id) and user_books(int user_book_id,int user_id,int book_id) and two additional tables books(int book_id, varchar book_title, int author_id) and authors (int author_id, varchar author_name).

Given a specific user_id I want to get the books that the user DOES NOT HAVE that where written by authors that he does have other books writen by them.

So if the user has BOOK1 (i.e. there exists a row for this in user_books) and does not have BOOK2 and BOOK3 that where written by the same author as BOOK1, I want to get the ids for BOOK2 and BOOK3.

I guess I can do this using a SELECT WHERE NOT IN () but for performance reasons I am looking for a join based solution.

4
  • Have you tried using an outer join? Commented Mar 13, 2012 at 11:55
  • as I said, I can write this using a "select where not in" but I want to use joins. I am aware that it should be somehow done using a left join but I am not sure exactly how. Commented Mar 13, 2012 at 12:34
  • I am also having a problem wit the fact that some users might have more then a single user_book already so a join would return the author ID more then once. Commented Mar 13, 2012 at 12:39
  • should I start by finding all the books by the "related" authors? Commented Mar 13, 2012 at 12:42

1 Answer 1

2

I'd check the performance versus a "not in" or other solution but I believe the following would work:

select exist.userId, b.bookTitle, a.authorName  
from (select distinct ub.userId, b.authorId  
         from userBooks ub  
           inner join books b on b.bookId = ub.bookId  
         where ub.userId = @userId) exist  
  inner join Authors a on a.authorId = exist.authorId  
  inner join Books b on b.authorId = a.authorId  
  left outer join userBooks ub on ub.bookId = b.bookId and ub.userId = exist.userId  
where ub.userId is null

The derived table finds all the authors that a user likes then the rest of the query finds other books by the same authors

Sign up to request clarification or add additional context in comments.

6 Comments

you are right - that works, but it is far more complicated (as you expected) than using IN and NOT IN. If you do an explain on your query you see there are 4 PRIMARY queries and 2 DERIVED queries (one using a temporary table). Whereas "select * from books where author_id in (select author_id from user_books as ub join books as b on b.book_id = ub.book_id) and book_id not in (select book_id from user_books where user_id = @uid);" has four queries, all using where (and therefore eligible for acceleration with indexes). I agree that measuring both approaches is necessary to pick the best.
@D Mac - exactly. You can't take a wholesale "don't use NOT IN" approach. Just because it's possible to do it without them doesn't mean you should ... Far better as you've suggested to look at the way the query is processed. I just wanted to demonstrate the possible but I'd recommend based on analysis
thank you both, I will try and work things out from here. @D Mac In the syntax you provided there is a missing where user_id = @uid in the first sub query as far as I can tell. and even tough my books table has a an index on author_id and on book_id neither is used.
@KAJ where does the specific user id enter this query?
In the derived table bit - as per updated answer. Please take note of the performance comments from @D Mac and myself
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.