2

MySQL version: 5.7.44

DROP TABLE IF EXISTS `test`.`b`;
DROP TABLE IF EXISTS `test`.`a`;

CREATE TABLE `test`.`a` (
    `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
    `type` VARCHAR(45) NOT NULL,
    `created_at` DATETIME NOT NULL,
    PRIMARY KEY (`id`),
    KEY `type_created_at_idx` (`type` , `created_at`)
)  ENGINE=INNODB DEFAULT CHARSET=UTF8MB4;

INSERT INTO `test`.`a` (`id`, `type`, `created_at`) VALUES
(1, 'A', '2024-08-31'),
(2, 'A', '2024-08-30'),
(3, 'B', '2024-08-29'),
(4, 'B', '2024-08-28'),
(5, 'C', '2024-08-27'),
(6, 'C', '2024-08-26'),
(7, 'C', '2024-08-25'),
(8, 'D', '2024-08-24'),
(9, 'D', '2024-08-23'),
(10, 'E', '2024-08-22');

CREATE TABLE `test`.`b` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `type` varchar(45) NOT NULL,
  `name` varchar(45) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `name_type_idx` (`name`,`type`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

INSERT INTO `test`.`b` (`type`, `name`) VALUES
('A', 'admin'),
('B', 'admin'),
('C', 'user'),
('D', 'user'),
('E', 'user');
  • EXPLAIN 1
EXPLAIN SELECT
  `a`.`id`
FROM
    `test`.`a`
WHERE
    `a`.`type` IN 
    ('A', 'B')
    AND `a`.`created_at` >= '2024-08-30';
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE a range type_created_at_idx type_created_at_idx 187 3 100.00 Using where; Using index
  • EXPLAIN 2
EXPLAIN SELECT
  `a`.`id`
FROM
    `test`.`a`
WHERE
    `a`.`type` IN 
    (SELECT `b`.`type` FROM `test`.`b` WHERE `b`.`name` = 'admin')
    AND `a`.`created_at` >= '2024-08-30';
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE b ref name_type_idx name_type_idx 182 const 2 100.00 Using index; LooseScan
1 SIMPLE a ref type_created_at_idx type_created_at_idx 182 test.b.type 1 33.33 Using where; Using index

EXPLAIN 1 works as expected, using range scan and filtered is 100.

As shown in EXPLAIN 2. When I replace the contents of IN with the equivalent query, MySQL no longer uses a range scan, and filtered is 33.33.

What causes this? How do I make EXPLAIN 2's filtered to be 100?

3
  • 2
    Literals list and subquery output is not the same. Even if the values sets are the same (which is random). Commented Sep 5, 2024 at 10:42
  • Does this mean EXPLAIN 1 is better than EXPLAIN 2? Commented Sep 6, 2024 at 0:10
  • 1
    They differs - this is the only thing which we can say. Commented Sep 6, 2024 at 4:12

1 Answer 1

1

You essentially have two range tests in the WHERE clause -- IN and >=. The optimizer will help with one or the other of them, but not both.

UNION is a workaround for some situations:

SELECT  `a`.`id`
    FROM  `test`.`a`
    WHERE  `a`.`type` = 'A'
      AND  `a`.`created_at` >= '2024-08-30
UNION ALL
SELECT  `a`.`id`
    FROM  `test`.`a`
    WHERE  `a`.`type` = 'B'
      AND  `a`.`created_at` >= '2024-08-30';

Then it will use both parts of this for both SELECTs:

INDEX(`type` , `created_at`)   -- in this order (as you have it)

Note: UNION ALL is faster than UNION (aka UNION_DISTINCT). It should be appropriate for this query, but not for some other similar queries.

2
  • This looks good, but is it still suitable when the number of types reaches tens of thousands? Commented Sep 25, 2024 at 2:44
  • 1
    10K SELECTs connected by UNION would probably be less efficient than IN with 10K constants. Commented Sep 25, 2024 at 22:48

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.