1

I have the following query:

select  *
    from  test_table
    where  app_id = 521
      and  is_deleted=0
      and  category in (7650)
      AND  created_timestamp >= '2020-07-28 18:19:26'
      AND  created_timestamp <= '2020-08-04 18:19:26'
    ORDER BY  created_timestamp desc
    limit  30

All four fields, app_id, is_deleted, category and created_timestamp are indexed. However, the cardinality of app_id and is_deleted are very small (3 each). category field is fairly distributed, but created_timestamp seems like a very good index choice for this query.

However, MySQL is not using the created_timestamp index and is in turn taking 4 seconds to return. If I force MySQL to use the created_timestamp index using USE INDEX (created_timestamp), it returns in 40ms.

I checked the output of explain command to see why that's happening, an found that MySQL is performing the query with the following params:

Automatic index decision, takes > 4s

type: index_merge
key: category,app_id,is_deleted
rows: 10250
filtered: 0.36
Using intersect(category,app_id,is_deleted); Using where; Using filesort

Force index usage:

Use index created_timestamp, takes < 50ms
type: range
key: created_timestamp
rows: 47000
filtered: 0.50
Using index condition; Using where; Backward index scan

MySQL probably decides that lesser number of rows to scan is better, and that makes sense also, but then why does it take forever for the query to return in that case? How can I fix this query?

6
  • 2
    Using intersect is like doing three queries, to find several subsets of the table, finding rows that exist in all three subsets. You should consider defining a multi-column index on (app_id, is_deleted, created_timestamp, category) in that order. Commented Aug 10, 2020 at 14:21
  • @BillKarwin - If the IN had multiple values, I might agree with your ordering. When there is only one id, it will be optimized as =, at which point, it is distinctly better to put category before the date range. Commented Aug 10, 2020 at 22:00
  • 1
    @RickJames Putting created_timestamp first eliminates the filesort. The fourth column can't be searched as a SQL-layer lookup either way, but it can at least be filtered by InnoDB index condition pushdown. Commented Aug 10, 2020 at 22:43
  • @BillKarwin - for category IN (7650), which is optimized identically to category = 7650, it will get past category. Commented Aug 10, 2020 at 22:46
  • I'm assuming it will have multiple values in the general case of the query. Commented Aug 10, 2020 at 22:54

2 Answers 2

1

The using intersection and the using filesort are both costly for performance. It's best if we can eliminate these.

Here's a test. I'm assuming the IN ( ... ) predicate could sometimes have multiple values, so it will be a range type query, and cannot be optimized as an equality.

CREATE TABLE `test_table` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `app_id` int(11) NOT NULL,
  `is_deleted` tinyint(4) NOT NULL DEFAULT '0',
  `category` int(11) NOT NULL,
  `created_timestamp` timestamp NOT NULL,
  `other` text,
  PRIMARY KEY (`id`),
  KEY `a_is_ct_c` (`app_id`,`is_deleted`,`created_timestamp`,`category`),
  KEY `a_is_c_ct` (`app_id`,`is_deleted`,`category`,`created_timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

If we use your query and hint the optimizer to use the first index (created_timestamp before category), we get a query that eliminates both:

EXPLAIN SELECT * FROM test_table FORCE INDEX (a_is_ct_c) 
WHERE  app_id = 521
  AND  is_deleted=0
  AND  category in (7650,7651,7652)
  AND  created_timestamp >= '2020-07-28 18:19:26' 
  AND  created_timestamp <= '2020-08-04 18:19:26'
ORDER BY created_timestamp DESC\G

           id: 1
  select_type: SIMPLE
        table: test_table
   partitions: NULL
         type: range
possible_keys: a_is_ct_c
          key: a_is_ct_c
      key_len: 13
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: Using index condition

Whereas if we use the second index (category before created_timestamp), then at least the using intersection is gone, but we still have a filesort:

EXPLAIN SELECT * FROM test_table FORCE INDEX (a_is_c_ct) 
WHERE  app_id = 521
  AND  is_deleted=0
  AND  category in (7650,7651,7652)
  AND  created_timestamp >= '2020-07-28 18:19:26' 
  AND  created_timestamp <= '2020-08-04 18:19:26'
ORDER BY created_timestamp DESC\G

               id: 1
  select_type: SIMPLE
        table: test_table
   partitions: NULL
         type: range
possible_keys: a_is_c_ct
          key: a_is_c_ct
      key_len: 13
          ref: NULL
         rows: 3
     filtered: 100.00
        Extra: Using index condition; Using filesort

The "using index condition" is a feature of InnoDB to filter the fourth column at the storage engine level. This is called Index condition pushdown.

Sign up to request clarification or add additional context in comments.

Comments

1

The optimal index for the query given, plus some others:

INDEX(app_id, is_deleted,  -- put first, in either order
      category,            -- in this position, assuming it might have multiple INs
      created_timestamp)   -- a range; last.

"Index merge intersect" is probably always worse than having an equivalent composite index.

Note that an alternative for the Optimizer is to ignore the WHERE and focus on the ORDER BY, especially because of LIMIT 30. However, this is very risky. It may have to scan the entire table without finding the 30 rows desired. Apparently, it had to look at about 47000 rows to find the 30.

With the index above, it will touch only 30 (or fewer) rows.

"All four fields, ... are indexed." -- This is a common misconception, especially by newcomers to databases. It is very rare for a query to use more than one index. So, it is better to try for a "composite" index, which is likely to work much better.

How to build the optimal INDEX for a given SELECT: http://mysql.rjweb.org/doc.php/index_cookbook_mysql

11 Comments

i'd put is_deleted first, cause this column is most likely present in most of the queries, where you only want to show "not deleted" entities. hence that index becomes useful for any query, testing against is_deleted.
@dognose - the "cardinality" of the individual columns in a composite index does not matter. Think of it this way: the 4 columns will be [logically] concatenated into a long string, and the index is ordered based on that long string.
@dognose - Or are you pointing out that some queries don't include is_deleted? I would provide another composite index with it omitted. (Caveat: This is not a universal rule, but is likely to be useful in your table.)
I know how it works. Not talking about the cardinality here. Just thinking, that in a system, where entities are not "deleted", but only marked deleted, about every query contains where deleted = 0, if you access live-data. So, putting this first will allow the index to be used for almost every query, even if no app-id is given (app-id sounds like a more special view/filter to me, like a sub-list of ALL apps known.) Just saying, i'd put that first, doesn't mean it is correct.
@nimbudew - A mixture of an IN and a range -- There is no good answer. One order works in some cases; the other in other. For 1 item in IN, category needs to be first. See also my discussion with Bill. Providing two indexes may be beneficial -- the Optimizer gets to pick, and it may pick the better one.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.