I've got a complex multi-join query where the optimizer won't pick an index on a TIMESTAMP column even though it results in a smaller index/table scan. If I force the index on the TIMESTAMP column, then the query is much faster; however I can't do this because it's only in a small number of cases when the TIMESTAMP column is the correct index to use (and EXPLAIN seems to confirm this) but MySQL seems to ignore this. The table concerned has a very large number of rows (~300 million, ~50GB with indexes) and the cardinality of the TIMESTAMP column index is very high (SHOW INDEX for the table is below).
For example, an EXPLAIN with no index hinting produces this (the equivalent query never completes):
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE rf ref si_id,si_campaign,si_source,rf_timestamp si_campaign 4 const 39150388 Using where; Using filesort
1 SIMPLE referral_source eq_ref PRIMARY PRIMARY 4 rf.rfs_id 1
1 SIMPLE referral_campaign eq_ref PRIMARY PRIMARY 4 rf.rfc_id 1
1 SIMPLE referral_term eq_ref PRIMARY PRIMARY 4 rf.rft_id 1
1 SIMPLE member eq_ref PRIMARY PRIMARY 4 rf.me_id 1
1 SIMPLE payment ref me_id,pat_id,pas_id me_id 4 rf.me_id 3
If you use FORCE INDEX (rf_timestamp), you get a query that scans a third as many rows, and completes in around 10 - 15 seconds:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE rf range rf_timestamp rf_timestamp 4 NULL 13092948 Using where; Using filesort
1 SIMPLE referral_source eq_ref PRIMARY PRIMARY 4 rf.rfs_id 1
1 SIMPLE referral_campaign eq_ref PRIMARY PRIMARY 4 rf.rfc_id 1
1 SIMPLE referral_term eq_ref PRIMARY PRIMARY 4 rf.rft_id 1
1 SIMPLE member eq_ref PRIMARY PRIMARY 4 rf.me_id 1
1 SIMPLE payment ref me_id,pat_id,pas_id me_id 4 rf.me_id 3
I can't use the FORCE INDEX, because most queries scan around 1500 rows index (as MySQL wants to use for the above queries) that would scan around 30 million rows using the rf_timestamp. The only solution I can think is to parse both the EXPLAINs listed above first, and then make a decision based on the number of rows returned for the rf table.
SHOW INDEXES FROM rf;
Table N_u Key_name Seq Column_name Col. Cardinality S_p Packed Null Index_type
rf 0 PRIMARY 1 rf_id A 313743334 NULL NULL BTREE
rf 1 rfs_id 1 rfs_id A 18 NULL NULL BTREE
rf 1 rfc_id 1 rfc_id A 18 NULL NULL BTREE
rf 1 si_id 1 si_id A 18 NULL NULL BTREE
rf 1 me_id 1 me_id A 62748666 NULL NULL YES BTREE
rf 1 si_campaign 1 si_id A 18 NULL NULL BTREE
rf 1 si_campaign 2 rfc_id A 175471 NULL NULL BTREE
rf 1 si_source 1 si_id A 18 NULL NULL BTREE
rf 1 si_source 2 rfs_id A 18 NULL NULL BTREE
rf 1 rf_timestamp 1 rf_timestamp A 104581111 NULL NULL BTREE
Table structure:
CREATE TABLE `rf` (
`rf_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`rf_cost` decimal(5,2) DEFAULT NULL,
`me_id` int(10) unsigned DEFAULT NULL,
`mel_id` tinyint(3) unsigned NOT NULL DEFAULT '0',
`si_id` int(10) unsigned NOT NULL,
`rfs_id` int(10) unsigned NOT NULL,
`rfc_id` int(10) unsigned NOT NULL,
`rft_id` int(10) unsigned NOT NULL DEFAULT '0',
`rf_ip_address` int(10) unsigned DEFAULT NULL,
`rf_invalid` tinyint(3) unsigned DEFAULT '0',
`rf_timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`rf_id`),
KEY `rfs_id` (`rfs_id`),
KEY `rfc_id` (`rfc_id`),
KEY `si_id` (`si_id`),
KEY `me_id` (`me_id`),
KEY `si_campaign` (`si_id`,`rfc_id`),
KEY `si_source` (`si_id`,`rfs_id`),
KEY `rf_timestamp` (`rf_timestamp`)
) ENGINE=InnoDB;
Here are some example queries on the table that exhibit the issue. So for example, with an si_id that has a lot of rows in the table:
SELECT COUNT(*)
FROM rf
WHERE rf.si_id = 19570
AND rf.rf_timestamp BETWEEN '2013-06-15 00:00:00' AND NOW();
1 SIMPLE rf ref si_id,si_campaign,si_source,rf_timestamp si_campaign 4 const 39039810 Using where
SELECT COUNT(*)
FROM rf FORCE INDEX (rf_timestamp)
WHERE rf.si_id = 19570
AND rf.rf_timestamp BETWEEN '2013-06-15 00:00:00' AND NOW();
1 SIMPLE rf range rf_timestamp rf_timestamp 4 NULL 10457136 Using where
COUNT(*): 228890
And for an si_id with only a handful of rows in the table, the optimized query plan is a better choice:
SELECT COUNT(*)
FROM rf
WHERE rf.si_id = 5913
AND rf.rf_timestamp BETWEEN '2013-06-15 00:00:00' AND NOW();
1 SIMPLE rf ref si_id,si_campaign,si_source,rf_timestamp si_campaign 4 const 2574 Using where
SELECT COUNT(*)
FROM rf FORCE INDEX (rf_timestamp)
WHERE rf.si_id = 5913
AND rf.rf_timestamp BETWEEN '2013-06-15 00:00:00' AND NOW();
1 SIMPLE rf range rf_timestamp rf_timestamp 4 NULL 10481348 Using where
COUNT(*): 221
In all of these cases, whatever has the lowest row count for the join on to the rf table is always the quickest query. I want MySQL to make that decision; I can't see a way of doing it myself other than parsing the EXPLAIN output.
SHOW CREATE TABLEoutputs would help, too, especially for therftable. It's giving more info thanSHOW INDEXESandDESCRIBEtable. And can you clarify if the index scan you want to hint/force is for different queries or for the same query (but different parameters)?SHOW CREATE TABLEadded. Same query, but different parameters. I can't show the query unfortunately because I'd have to do too much mangling. I have a feeling that there's not much I can do because you can't have that much fine grained control over the optimizer on a value-by-value basis, just wanted a sanity check to make sure I wasn't missing some obvious flag or feature to get MySQL to comply to what I want it to do.