Optimize SQL query

Question

I'm trying to optimize this slow query (>2s)

SELECT COUNT(*)
FROM crmentity c, mdcalls_trans_activity_update mtu, mdcalls_trans mt
WHERE (mtu.dept = 'GUN' OR  mtu.dept = 'gun') AND
      mtu.trans_code = mt.trans_code AND
      mt.activityid = c.crmid AND
      MONTH(mtu.ts) = 2 AND
      YEAR(mtu.ts) = YEAR(NOW()) AND
      c.deleted = 0 AND
      c.smownerid = 28

This is the output when I use EXPLAIN:

id  select_type table   type    possible_keys   key key_len ref rows    Extra   
1   SIMPLE  c   index_merge PRIMARY,crmentity_smownerid_idx,crmentity_deleted_smownerid_idx,crmentity_smownerid_deleted_idx crmentity_smownerid_idx,crmentity_deleted_smownerid_idx 4,8 NULL    91  Using intersect(crmentity_smownerid_idx,crmentity_deleted_smownerid_idx); Using where; Using index
1   SIMPLE  mt  ref activityid  activityid  4   pharex.c.crmid  60  
1   SIMPLE  mtu ref dept_idx    dept_idx    5   const   1530    Using where

It's using the index I created (dept_idx) but it still takes more than 2 seconds to run the query against a dataset of 1,380,384 records. Is there another way of expressing this query in an optimal fashion?

UPDATE: Using the suggestions of David, the query is now down to a few milliseconds instead of it running more than 2 seconds (actually, 51 seconds on version 5.0 of MySQL).

I would write WHERE lower(mtu.dept) = 'gun' AND ... but I assume your DB will already optimize it in that way. — initall
– initall, Commented Feb 15, 2010 at 8:59
I found, in Oracle at least, using lower on the lhs of the query caused massive slowdown. Whether it causes more slowdown than an additional string compare... — graham.reeds
– graham.reeds, Commented Feb 15, 2010 at 9:02
Using lower() on a column is a good way to NOT use any indices. That would explain your slowdowns. — David Schmitt
– David Schmitt, Commented Feb 15, 2010 at 9:10
Graham, David, you're so right, of course. I don't delete my comment so that the ANTI-pattern is still alive ;-) — initall
– initall, Commented Feb 15, 2010 at 10:17

David Schmitt · Accepted Answer · 2010-02-15 09:03:23Z

What is the most selective part of the WHERE clause? That is, which condition removes the most potential items from the result set?

I'd guess it's the mtu.ts filter. If that's true, you should also index the mtu.ts column and try to constrain on this in a way that the index can be used; for example by using the BETWEEN operator.

Other tips:

Attach join clauses directly to the join with JOIN ... ON (), this makes the query much easier to read, both for humans and the optimizer
Avoid calculating constants in the query, like YEAR(NOW())
Avoid functions of selected columns in the WHERE clause, like MONTH(mtu.ts). This reduces the possibilities for using indices massively.
Normalize your data to avoid casing problems like mtu.dept = 'GUN' OR mtu.dept = 'gun'; a single UPDATE mtu SET dept = lower(dept) and an appropriate CHECK dept = lower(dept) on the table will help avoiding such madness.

burnall · Accepted Answer · 2010-02-15 09:02:12Z

2

I would rewrite query using joins. It is more clear and give optimizer better chances.
MONTH(mtu.ts) = 2 AND YEAR(mtu.ts) = YEAR(NOW()) - better use mtu.ts between .. and ..

answered Feb 15, 2010 at 9:02

burnall

8405 silver badges10 bronze badges

3 Comments

FrancisV Over a year ago

How would you rewrite this? Thanks again.

burnall Over a year ago

select count(*) from crmentity c inner join mdcalls_trans mt on mt.activityid = c.crmid inner join mdcalls_trans_activity_update mtu on mtu.trans_code = mt.trans_code where mtu.ts between '20100201' and '20100228' and (mtu.dept in ('GUN', 'gun') and c.deleted = 0 and c.smownerid = 28

FrancisV Over a year ago

Thanks for this example. I created a function in PHP to get the starting date of the month and the end date of the month and used it in the 'BETWEEN' statement.

graham.reeds · Accepted Answer · 2010-02-15 09:03:43Z

0

Could you change the text string to a number?

answered Feb 15, 2010 at 9:03

graham.reeds

16.5k17 gold badges78 silver badges138 bronze badges

Comments

daz-fuller · Accepted Answer · 2010-02-15 09:08:05Z

0

The most obvious solution I can see would be to change COUNT(*) to cover just a single field name, otherwise your index might be next to useless!

answered Feb 15, 2010 at 9:08

daz-fuller

1,2311 gold badge10 silver badges18 bronze badges

Comments

Russ Clarke · Accepted Answer · 2010-02-15 09:13:09Z

0

As a general principle, a good approach to analysing problems like this is to understand the data your matching on, to appreciate its cardinality.

That is to say, order your query so that the most selective things happen first. What's more likely in your data, that dept = 'GUN' or that the userId would be 28.

Lasty, have you considered joining to MT and MTU instead of filtering ? It might make your query a lot faster as you'll be limiting the amount of data that needs the date comparisons.

answered Feb 15, 2010 at 9:13

Russ Clarke

17.9k4 gold badges44 silver badges45 bronze badges

1 Comment

Russ Clarke Over a year ago

Posted too fast, basically what David Schmitt and Burnall are saying!

Collectives™ on Stack Overflow

Optimize SQL query

5 Answers 5

Comments

3 Comments

Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

3 Comments

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related