12

I have a table of books :

CREATE TABLE `books` (
    `id` INT(11) NOT NULL AUTO_INCREMENT,
    `nameOfBook` VARCHAR(32),
    `releaseDate` DATETIME NULL DEFAULT NULL,
    PRIMARY KEY (`id`),
    INDEX `Index 2` (`releaseDate`, `id`)
)
COLLATE='latin1_swedish_ci'
ENGINE=InnoDB

AUTO_INCREMENT=33029692;

I compared two SQL requests to do a pagiation with sort on releaseDate. Both of theses request return the same result.

(simple one)

select SQL_NO_CACHE  id,name, releaseDate  
from books  
where releaseDate <= '2016-11-07'  
AND (releaseDate<'2016-11-07' OR id <    3338191)  
ORDER  by releaseDate DESC, id DESC limit 50;

and

(tuple comparison or row comparaison)

select SQL_NO_CACHE  id,name, releaseDate 
from books 
where (releaseDate ,id) < ('2016-11-07',3338191) 
ORDER  by releaseDate DESC, id DESC limit 50;

When I do the explain of the request i got this

simple one :

"id";"select_type";"table";"type";"possible_keys";"key";"key_len";"ref";"rows";"Extra"
"1";"SIMPLE";"books";"range";"PRIMARY,Index 2";"Index 2";"9";"";"1015876";"Using where; Using index"

We can see it is parsing "1015876" of rows

The explain for the tuple comparison :

"id";"select_type";"table";"type";"possible_keys";"key";"key_len";"ref";"rows";"Extra"
"1";"SIMPLE";"books";"index";"";"Index 2";"13";"";"50";"Using where; Using index"

We can see it is parsing "50" of rows.

But if I checked the exectution time the simple one :

/* Affected rows: 0  Lignes trouvées: 50  Avertissements: 0  Durée pour 1 query: 0,031 sec. */

and the tuple one :

/* Affected rows: 0  Lignes trouvées: 50  Avertissements: 0  Durée pour 1 query: 3,682 sec. */

I don't understant why according to the explain the tuple comparison is better but the execution time is badly worse?

6
  • The execution plan is the "path" the optimizer will choose, the number of rows parsed doesn't necessarily affect the outcome . Commented Nov 8, 2016 at 14:33
  • according to my result test the tuple comparison is less performant than the simple one. Is it a bad practice to use tuple comparison then? Commented Nov 8, 2016 at 15:57
  • I'm looking for authoritative description on how tuple comparison uses matching compound indices as well as practical performance. Commented Apr 16, 2017 at 11:50
  • @qarma For what it's worth.. an index on (releaseDate, id) is effectively the same as an index on just releaseDate in this case. See Docs, "In InnoDB, each record in a secondary index contains the primary key columns for the row, as well as the columns specified for the secondary index." Commented Apr 21, 2017 at 11:22
  • 2
    @Hannes - Alas, one of those half-implemented features in MySQL -- syntax and functionality is there, but performance is not. Commented Apr 23, 2017 at 18:45

1 Answer 1

16
+100

I've been irritated by this for years. WHERE (a,b) > (1,2) has never been optimized, in spite of it being easily transformed into the other formulation. Even the other format was poorly optimized until a few years ago.

Using EXPLAIN FORMAT=JSON SELECT ... might give you some better clues.

Meanwhile, EXPLAIN ignored the LIMIT and suggested 1015876. On many cases, EXPLAIN provides a "decent" Row estimate, but not either of these.

Feel free to file a bug report: http://bugs.mysql.com (and post the link here).

Another formulation was recently optimized, in spite of OR being historically un-optimizable.

where  releaseDate <  '2016-11-07'  
   OR (releaseDate  = '2016-11-07' AND id < 3338191)  

For measuring query optimizations, I like to do:

FLUSH STATUS;
SELECT ...
SHOW SESSION STATUS LIKE 'Handler%';

Small values, such as '50' for your case, indicate good optimization; large value (1M) indicate a scan. The Handler numbers are exact; unlike the estimates in EXPLAIN.

Update 5.7.3 has improved handling of tuples, aka "row constructors"

Update MySQL Bug#104128 covers this.

Sign up to request clarification or add additional context in comments.

10 Comments

This is the same as they recommend here. But the example there is a bit different. Reading it, one might assume, that the query in this question should work fine, since the condition here does cover the index.
I read that page as "here's some nifty syntax", not "this will perform well".
Methinks there are 2 ways to rewrite (a,b)>(1,2): namely a>1 or a=1 and b>2 and also a>=1 and (a>1 or b>2) where at least index on a can always be utilised (under some assumptions of data density in a and b).
@qarma - I previously did some experiments with 'Handler%'; it turned out that INDEX(a,b) worked well for either of the rewrites. Please try it on your version to confirm. (And state which MySQL/MariaDB version you are running.)
I'm only interested in newest open source MySQL version, that's 5.7.x
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.