3

I need your help to optimize the query to avoid using "Using filesort". The query is:

SELECT name
FROM actor
WHERE actor_id IN (3333,1452,2587,3003,3044,3524,3700,7087,7088)
ORDER BY name ASC

The Explain results:

1   SIMPLE  actor   range   PRIMARY PRIMARY 2       9   Using where; Using filesort

============================================================

SQL Fiddle http://sqlfiddle.com/#!2/50c4d/1/0

Table:

CREATE TABLE `actor` (
`actor_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(45) NOT NULL,  
PRIMARY KEY (`actor_id`),
UNIQUE KEY `name_UNIQUE` (`name`)
) ENGINE=InnoDB;

Sample data:

INSERT INTO `actor` VALUES (7087, 'Brill');
INSERT INTO `actor` VALUES (3333, 'Rey');
INSERT INTO `actor` VALUES (7088, 'Graves');
INSERT INTO `actor` VALUES (1452, 'Phoenix');
INSERT INTO `actor` VALUES (2587, 'Segal');
INSERT INTO `actor` VALUES (3003, 'Taylor');
INSERT INTO `actor` VALUES (3044, 'Daniels');
INSERT INTO `actor` VALUES (3524, 'Michaels');
INSERT INTO `actor` VALUES (3700, 'Tryme');

Index:

ADD INDEX idx_test (actor_id, name) -> EXTRA: Using where; Using index; Using filesort
3
  • do you have a lot more data in that table? when setting up that table with only the data you provide, i only get "using where;using index" Commented May 28, 2014 at 15:27
  • yes i have more fields like url, thumb, status... but always EXTRA: Using filesort.. (with index, without index).. I do not know why.. Commented May 28, 2014 at 19:01
  • @Marcus-Adams sqlfiddle.com/#!2/50c4d/1 Commented May 30, 2014 at 21:15

2 Answers 2

2

You can use an index for the range predicate IN(...). Or you can use an index to eliminate the filesort.

You can't get both operations to use an index, at least not when the column in the predicate is different from the column in the sort order.

You created this index:

ADD INDEX idx_test (actor_id, name) -> EXTRA: Using where; Using index; Using filesort

This helped to find the matching actor_id values. And the composite index included the name column you wanted. But then you want it sorted by name. The index is not sorted by name, it's sorted by actor_id then by name.

Here's an Analogy to Explain Why This Doesn't Work:

Suppose I ask you to look in the telephone book and find people whose last names are Franklin, Hamilton, Jefferson, Washington. Then sort them by first name. The phone book is ordered by last name, then by first name. So you can find the last names quickly, but the first names are returned Benjamin, Alexander, Thomas, George -- not in any sensible order.

To sort them by first name, you'd have to collect the results and then sort them manually. The fact that they were sorted in the telephone book doesn't help.

Sign up to request clarification or add additional context in comments.

Comments

1

I'm always confused why people are so hell-bent on avoiding FILESORT !?!

You are asking for a sub-set of a table based on the actor_id. The server sees there is an index (PK or idx_test) on the actor_id field and will zip through the index to find the relevant records and return them. Now, additionally, you also want the output to be in a given order. Had the order been ORDER BY actor_id then it would have been possible to make use of the fact that the fetched records were already pre-sorted on this field in the index (or PK). Thus, no re-sorting would be required and the output could be returned 'as-is' (**).

BUT, you don't want them in order of actor_id, you want them in order of name. So, the machine does what you ask and sorts the records by name before returning them to you. I doubt sorting such a small list is going to take up a lot of resources or time.

PS: I don't think the index helps you much here, in fact it (badly!) duplicates the (clustered) PK. The only (potential) benefit i can see for such an index would be that IF the actual table is much wider THEN it would work as a covering index for this query, reducing the I/O (++). Mind you, this also means you can't ask for any of the other fields when querying.

(**: I'm sure it's all a bit more complex internally)

(++: Less I/O in case of SELECT, IUD will require more I/O)

3 Comments

+1, especially for the first sentence. People are hell-bent on avoiding it due to having absolutely no clue what goes on and they come to wrong conclusions.
@deroby Thank you for your explanation, I thought "using filesort" or "using temporary" is always a bad thing that slows query, so I look for ways to avoid it (actor table has thousands of records and more columns)
No worries, "Thousands of rows" should be a breeze for any self-respecting RDBMS.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.