5

Looking for ways to optimize the following query in MySQL. I have tried creating a multi column index on sales_date, serviceID, and initalStatus but it does not get used. I've tried to research but am new to optimization and cannot seem to find an answer that fits. Below is the query:

SELECT 
COUNT(id) as TotalAccounts,
AVG(sale_value) AS SaleValue,
AVG(credit_card = 1) * 100 AS CreditCard,
SUM(CASE WHEN pre_status = 1 AND bill_status = 'current' THEN 1 
ELSE 0
END) AS Active, 
SUM(CASE WHEN pre_status = 1 AND bill_status = 'past' THEN 1 
ELSE 0
END) AS PastDue, 
SUM(CASE WHEN `status` = 0 AND bill_status = 'past' THEN 1 
ELSE 0
END) AS Canceled
FROM table_x  
WHERE sales_date >= CAST('2015-01-01' AS DATE) 
AND sales_date <= CAST('2016-01-01' AS DATE)
AND serviceID = 1
AND initialStatus = 1 

And the EXPLAIN output:

id:            '1',
select_type:   'SIMPLE',
table:         'table_x',
type:          'ALL', 
possible_keys: 'sales_date,Combo sales_date office_id,salesDate_serviceID_initalStatus', 
key:           NULL,
key_len:       NULL,
ref:           NULL,
rows:          '177585',
Extra:         'Using where'

For context, total records: 204,830. Records in my date range: 65,491.

4
  • Can you include the code you used to create the multi-column index? Commented Jun 5, 2017 at 4:37
  • Possible bug: sales_date <= CAST('2016-01-01' AS DATE) includes the end date. Change to simply sales_date < '2016-01-01' . Note also that casting is not necessary. Commented Jun 5, 2017 at 18:27
  • 65,491 in the date range, but how many in the resultset? Commented Jun 5, 2017 at 18:28
  • @RickJames Got rid of the casting. Thanks for the tip. Commented Jun 6, 2017 at 15:13

1 Answer 1

11

You should do better with an index on columns in a different order:

ALTER TABLE table_x ADD INDEX (serviceID, initialStatus, sales_date);

The order of columns in the index is important. Your condition on sales_date is a range condition, i.e. it may match multiple values. Whereas the other two conditions on serviceID and initialStatus are equality conditions that match one value (or zero if the value is not found).

It's generally true that in an index lookup, all the equality conditions must be on columns that are leftmost in the multi-column index. Once a column of the index is used for a range condition, any further columns to the right in the index are not used.

Suppose an index on columns (A, B, C).

A condition like WHERE A=1 AND B=2 AND C=3 will use all three columns of the index.

A condition like WHERE A=1 AND B>2 AND C=3 will use only columns A and B in the index. Then the condition for column C will be applied, row-by-row, on all the rows that matched the A and B conditions.

A condition like WHERE A>1 AND B=2 AND C=3 will only use the first column on A for the index lookup.

The order of terms in your WHERE clause does not need to be the same as the order of columns in the index definition. MySQL knows how to rearrange the terms to match the column order.

You might like my presentation How to Design Indexes, Really.

Sign up to request clarification or add additional context in comments.

13 Comments

Thanks! That index worked. Great slides, they will help me a lot.
@Bill Karvin can you please elaborate as to why once a column of the index is used for a range condition, any further columns to the right in the index are not used?
@BrijeshShah, If you read a telephone book, and you search for all last names that start with 'S', are they sorted by first name? No, they are sorted by first name for a given last name, but not for the whole set of 'S' names. So if you search for last_name LIKE 'S%' AND first_name = 'Brijesh' the search can't depend on first_names being in any sorted order in the set of matching rows. The same principle applies for any other range predicate like BETWEEN, <, IN(...), !=, etc.
@BillKarwin thanks for explaining it in so simple terms. i'm creating a multi-column index on 3 columns, out of which 2 have equality condition and are boolean in type and the third one has a range condition and is datetime in type. going by your explanation, i have come to a conclusion to keep the boolean fields first and the datetime field at the end. and among boolean fields, the first one will be the one which filters the most amount of rows. sound good?
In my experience, as long as you put the columns used in equality conditions before the one in the range condition, the order of the equality columns doesn't matter as much.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.