2

I have a SQL query which looks simple but runs very slow ~4s:

SELECT tblbooks.*
FROM tblbooks LEFT JOIN
    tblauthorships ON tblbooks.book_id = tblauthorships.book_id
WHERE (tblbooks.added_by=3 OR tblauthorships.author_id=3)
GROUP BY tblbooks.book_id
ORDER BY tblbooks.book_id DESC
LIMIT 10

EXPLAIN result:

| id   | select_type | table          | type  | possible_keys     | key     | key_len | ref                    | rows | Extra       |
+------+-------------+----------------+-------+-------------------+---------+---------+------------------------+------+-------------+
|    1 | SIMPLE      | tblbooks       | index | fk_books__users_1 | PRIMARY | 62      | NULL                   |   10 | Using where |
|    1 | SIMPLE      | tblauthorships | ref   | book_id           | book_id | 62      | tblbooks.book_id       |    1 | Using where |
+------+-------------+----------------+-------+-------------------+---------+---------+------------------------+------+-------------+
2 rows in set (0.000 sec)

If I run the above query individually on each part of OR in WHERE statement, both queries return result in less than 0.01s.

Simplified schema:

  • tblbooks (~1 million rows):
| Field         | Type                  | Null | Key | Default             | Extra          |
+---------------+-----------------------+------+-----+---------------------+----------------+
| id            | int(10) unsigned      | NO   | MUL | NULL                | auto_increment |
| book_id       | varchar(20)           | NO   | PRI | NULL                |                |
| added_by      | int(11) unsigned      | NO   | MUL | NULL                |                |
+---------------+-----------------------+------+-----+---------------------+----------------+
  • tblauthorships (< 100 rows):
| Field         | Type             | Null | Key | Default             | Extra          |
+---------------+------------------+------+-----+---------------------+----------------+
| authorship_id | int(11) unsigned | NO   | PRI | NULL                | auto_increment |
| book_id       | varchar(20)      | NO   | MUL | NULL                |                |
| author_id     | int(11) unsigned | NO   | MUL | NULL                |                |
+---------------+------------------+------+-----+---------------------+----------------+

Both book_id and author_id columns in tblauthorships have their index created.

Can anyone point me to the right direction?

Note: I'm aware of book_id varchar issue.

2 Answers 2

3

My usual analogy for indexing is a telephone book. It's sorted by last name then by first name. If you look up a person by last name, you can find them efficiently. If you look up a person by last name AND first name, it's also efficient. But if you look up a person by first name only, the sort order of the book doesn't help, and you have to search every page the hard way.

Now what happens if you need to search a telephone book for a person by last name OR first name?

SELECT * FROM TelephoneBook WHERE last_name = 'Thomas' OR first_name = 'Thomas';

This is just as bad as searching only by first name. Since all entries matching the first name you searched should be included in the result, you have to find them all.

Conclusion: Using OR in an SQL search is hard to optimize, given that MySQL can use only one index per table in a given query.

Solution: Use two queries and UNION them:

SELECT * FROM TelephoneBook WHERE last_name = 'Thomas'
UNION
SELECT * FROM TelephoneBook WHERE first_name = 'Thomas';

The two individual queries each use an index on the respective column, then the results of both queries are unified (by default UNION eliminates duplicates).

In your case you don't even need to do the join for one of the queries:

(SELECT b.*
 FROM tblbooks AS b
 WHERE b.added_by=3)
UNION
(SELECT b.*
 FROM tblbooks AS b
 INNER JOIN tblauthorships AS a USING (book_id)
 WHERE a.author_id=3)
ORDER BY book_id DESC
LIMIT 10
Sign up to request clarification or add additional context in comments.

3 Comments

I'm reading your book right now (Pitfalls of Database Programming) and just from the first sentence I thought "I know who this is"
I like your analogy. It's very simple to understand. UNION did come to my mind. What I wanted though is to also apply pagination. When UNION like this, wouldn't it select all records from tblbooks with added_by=3 and from tblauthorships with author_id=3 into memory before applying LIMIT 10?
Yes, UNION often does use a temporary table to hold result sets as they accumulate, and then applies the LIMIT to the temporary table. Thanks for reading my book!
1

The two answers so far are not very optimal. Since they have both UNION and LIMIT, let me further optimize their answers:

( SELECT ...
    ORDER BY ...
    LIMIT 10
) UNION DISTINCT
( SELECT ...
    ORDER BY ...
    LIMIT 10
)
ORDER BY ...
LIMIT 10

This gives each SELECT a chance to optimize the ORDER BY and LIMIT, making them faster. Then the UNION DISTINCT dedups. Finally, the first 10 are peeled off to make the resultset.

If there will be pagination via OFFSET, this optimization gets trickier. See http://mysql.rjweb.org/doc.php/index_cookbook_mysql#or

Also... Your table needs two indexes:

INDEX(added_by)
INDEX(author_id)

(Please use SHOW CREATE TABLE; it is more descriptive than DESCRIBE.)

1 Comment

Nice solution, Rick. Let me do some test and I'll get back to you.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.