MySQL query optimizer ignoring smaller scan on TIMESTAMP column, cardinality?

Question

I've got a complex multi-join query where the optimizer won't pick an index on a TIMESTAMP column even though it results in a smaller index/table scan. If I force the index on the TIMESTAMP column, then the query is much faster; however I can't do this because it's only in a small number of cases when the TIMESTAMP column is the correct index to use (and EXPLAIN seems to confirm this) but MySQL seems to ignore this. The table concerned has a very large number of rows (~300 million, ~50GB with indexes) and the cardinality of the TIMESTAMP column index is very high (SHOW INDEX for the table is below).

For example, an EXPLAIN with no index hinting produces this (the equivalent query never completes):

id   select_type  table               type     possible_keys                             key           key_len  ref            rows      Extra
1    SIMPLE       rf                  ref      si_id,si_campaign,si_source,rf_timestamp  si_campaign   4        const          39150388  Using where; Using filesort
1    SIMPLE       referral_source     eq_ref   PRIMARY                                   PRIMARY       4        rf.rfs_id      1        
1    SIMPLE       referral_campaign   eq_ref   PRIMARY                                   PRIMARY       4        rf.rfc_id      1        
1    SIMPLE       referral_term       eq_ref   PRIMARY                                   PRIMARY       4        rf.rft_id      1        
1    SIMPLE       member              eq_ref   PRIMARY                                   PRIMARY       4        rf.me_id       1        
1    SIMPLE       payment             ref      me_id,pat_id,pas_id                       me_id         4        rf.me_id       3

If you use FORCE INDEX (rf_timestamp), you get a query that scans a third as many rows, and completes in around 10 - 15 seconds:

id   select_type  table               type     possible_keys        key           key_len  ref            rows      Extra
1    SIMPLE       rf                  range    rf_timestamp         rf_timestamp  4        NULL           13092948  Using where; Using filesort
1    SIMPLE       referral_source     eq_ref   PRIMARY              PRIMARY       4        rf.rfs_id      1
1    SIMPLE       referral_campaign   eq_ref   PRIMARY              PRIMARY       4        rf.rfc_id      1
1    SIMPLE       referral_term       eq_ref   PRIMARY              PRIMARY       4        rf.rft_id      1
1    SIMPLE       member              eq_ref   PRIMARY              PRIMARY       4        rf.me_id       1
1    SIMPLE       payment             ref      me_id,pat_id,pas_id  me_id         4        rf.me_id       3

I can't use the FORCE INDEX, because most queries scan around 1500 rows index (as MySQL wants to use for the above queries) that would scan around 30 million rows using the rf_timestamp. The only solution I can think is to parse both the EXPLAINs listed above first, and then make a decision based on the number of rows returned for the rf table.

SHOW INDEXES FROM rf;

Table   N_u  Key_name       Seq  Column_name   Col. Cardinality  S_p     Packed    Null    Index_type
rf      0    PRIMARY        1    rf_id         A    313743334    NULL    NULL              BTREE
rf      1    rfs_id         1    rfs_id        A    18           NULL    NULL              BTREE
rf      1    rfc_id         1    rfc_id        A    18           NULL    NULL              BTREE
rf      1    si_id          1    si_id         A    18           NULL    NULL              BTREE
rf      1    me_id          1    me_id         A    62748666     NULL    NULL      YES     BTREE
rf      1    si_campaign    1    si_id         A    18           NULL    NULL              BTREE
rf      1    si_campaign    2    rfc_id        A    175471       NULL    NULL              BTREE
rf      1    si_source      1    si_id         A    18           NULL    NULL              BTREE
rf      1    si_source      2    rfs_id        A    18           NULL    NULL              BTREE
rf      1    rf_timestamp   1    rf_timestamp  A    104581111    NULL    NULL              BTREE

Table structure:

CREATE TABLE `rf` (
  `rf_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `rf_cost` decimal(5,2) DEFAULT NULL,
  `me_id` int(10) unsigned DEFAULT NULL,
  `mel_id` tinyint(3) unsigned NOT NULL DEFAULT '0',
  `si_id` int(10) unsigned NOT NULL,
  `rfs_id` int(10) unsigned NOT NULL,
  `rfc_id` int(10) unsigned NOT NULL,
  `rft_id` int(10) unsigned NOT NULL DEFAULT '0',
  `rf_ip_address` int(10) unsigned DEFAULT NULL,
  `rf_invalid` tinyint(3) unsigned DEFAULT '0',
  `rf_timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`rf_id`),
  KEY `rfs_id` (`rfs_id`),
  KEY `rfc_id` (`rfc_id`),
  KEY `si_id` (`si_id`),
  KEY `me_id` (`me_id`),
  KEY `si_campaign` (`si_id`,`rfc_id`),
  KEY `si_source` (`si_id`,`rfs_id`),
  KEY `rf_timestamp` (`rf_timestamp`)
) ENGINE=InnoDB;

Here are some example queries on the table that exhibit the issue. So for example, with an si_id that has a lot of rows in the table:

SELECT COUNT(*)
FROM rf
WHERE rf.si_id = 19570
    AND rf.rf_timestamp BETWEEN '2013-06-15 00:00:00' AND NOW();

1   SIMPLE  rf  ref si_id,si_campaign,si_source,rf_timestamp    si_campaign 4   const   39039810    Using where

SELECT COUNT(*)
FROM rf FORCE INDEX (rf_timestamp)
WHERE rf.si_id = 19570
    AND rf.rf_timestamp BETWEEN '2013-06-15 00:00:00' AND NOW();

1   SIMPLE  rf  range   rf_timestamp    rf_timestamp    4   NULL    10457136    Using where

COUNT(*): 228890

And for an si_id with only a handful of rows in the table, the optimized query plan is a better choice:

SELECT COUNT(*)
FROM rf
WHERE rf.si_id = 5913
    AND rf.rf_timestamp BETWEEN '2013-06-15 00:00:00' AND NOW();

1   SIMPLE  rf  ref si_id,si_campaign,si_source,rf_timestamp    si_campaign 4   const   2574    Using where

SELECT COUNT(*)
FROM rf FORCE INDEX (rf_timestamp)
WHERE rf.si_id = 5913
    AND rf.rf_timestamp BETWEEN '2013-06-15 00:00:00' AND NOW();

1   SIMPLE  rf  range   rf_timestamp    rf_timestamp    4   NULL    10481348    Using where

COUNT(*): 221

In all of these cases, whatever has the lowest row count for the join on to the rf table is always the quickest query. I want MySQL to make that decision; I can't see a way of doing it myself other than parsing the EXPLAIN output.

The SHOW CREATE TABLE outputs would help, too, especially for the rf table. It's giving more info than SHOW INDEXES and DESCRIBE table. And can you clarify if the index scan you want to hint/force is for different queries or for the same query (but different parameters)? — ypercubeᵀᴹ
– ypercubeᵀᴹ, Commented Jun 20, 2013 at 13:57
SHOW CREATE TABLE added. Same query, but different parameters. I can't show the query unfortunately because I'd have to do too much mangling. I have a feeling that there's not much I can do because you can't have that much fine grained control over the optimizer on a value-by-value basis, just wanted a sanity check to make sure I wasn't missing some obvious flag or feature to get MySQL to comply to what I want it to do. — WheresWardy
– WheresWardy, Commented Jun 20, 2013 at 14:20
I've updated with a simple (non-joining) SELECT that exhibits the issue and what I'm trying to get around. — WheresWardy
– WheresWardy, Commented Jun 20, 2013 at 15:12
Good. Can you add the numbers (counts) that these queries return? — ypercubeᵀᴹ
– ypercubeᵀᴹ, Commented Jun 20, 2013 at 15:18

ypercubeᵀᴹ · Accepted Answer · 2013-06-20 15:30:51Z

1

For these queries and if your WHERE is as you have shown and you also have ORDER BY rf_timestamp you can use this index, which should be far better than a single index on si_id or a single index on rf_timestamp:

ALTER TABLE rf
  ADD INDEX si_id__rf_timestamp__IX         -- choose a name for the index
    (si_id, rf_timestamp) ;

With a table of this size, adding this index will take some time and the table will be locked in the mean time, so it would be better if you did this when there is not much traffic and work by others in the database.

answered Jun 20, 2013 at 15:30

ypercubeᵀᴹ

100k13 gold badges217 silver badges306 bronze badges

This was a consideration and may be what we have to do; I was looking at trying to influence the optimizer without having to add/change an index as I know I can get the results I want with the current indexes. Worst comes to worst, we'll do a pt-online-schema-change to add the additional index online.

WheresWardy
– WheresWardy

2013-06-20 15:53:48 +00:00
Commented Jun 20, 2013 at 15:53
If you have a dev environment, you could test the index there, first. Especially if your queries are more complex. For the queries you have shown, this is the best index but with joins on multiple tables, the efficiency gain may not be worth it.

ypercubeᵀᴹ
– ypercubeᵀᴹ

2013-06-20 16:29:35 +00:00
Commented Jun 20, 2013 at 16:29
Tried an overnight online add of that index on a slave, and it seems to be the only thing that's going to work. Thanks for everyone's help.

WheresWardy
– WheresWardy

2013-06-21 08:23:16 +00:00
Commented Jun 21, 2013 at 8:23

Add a comment |

Stack Exchange Network

MySQL query optimizer ignoring smaller scan on TIMESTAMP column, cardinality?

1 Answer 1

Your Answer

Hot Network Questions

MySQL query optimizer ignoring smaller scan on TIMESTAMP column, cardinality?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions