1

I'm currently trying to optimize a MYSQL statement that is taking quite some time. The table this is running on is 600k+ and the query is taking over 10 seconds.

SELECT DATE_FORMAT( timestamp, '%Y-%m-%d' ) AS date, COUNT( DISTINCT (
email
) ) AS count
FROM log
WHERE timestamp > '2009-02-23'
AND timestamp < '2020-01-01'
AND TYPE = 'play'
GROUP BY date
ORDER BY date DESC

I've just indexes on timestamp and type and also one on timestamp_type (type_2).

Here is the explain results, the problem seems to be a file sort but I don't know how to get around this...

id: 1
select_type: SIMPLE
table: log
type: ref
possible_keys: type,timestamp,type_2
key: type_2
key_len: 1
ref: const
rows: 226403
Extra: Using where; Using filesort

Thanks

1
  • Could you show your current indexe definitions with SQL code, so there is no doubt about how they are set up. Also, what data quantities are we talking about? (How many rows, how many "TYPE"s, how many rows per TYPE and timestamp?) Commented Jul 8, 2009 at 11:08

3 Answers 3

4

Things to try:

  • Have a separate date column (indexed) and use that instead of your timestamp column
  • Add an index across type and date
  • Use BETWEEN (don't think it will affect the speed but it's easier to read)

So ideally you would

  1. Create a date column and fill it using UPDATE table SET date = DATE(timestamp)
  2. Index across type and date
  3. Change your select to ... type = ? AND date BETWEEN ? AND ?
Sign up to request clarification or add additional context in comments.

4 Comments

converting the type column from varchar to int would help a little I guess.
+1 Exactly, MySQL can't use the index if you're grouping on a computed column
However, depending on how much of the table this is, it may still be more efficient for it to use a full table scan. I know the percentage of the table utilized to merit a full scan is always suprisingly less than I would thing.
Between is end point inclusive, be wary of this if rewriting.
0

Try rewriting to filter on TYPE alone first. Then apply your date range and aggregates. Basically create an inline view that filters type down. I know it's likely that the optimizer is doing this already, but when trying to improve performance I find it's helpful to be very certain of what things are happening first.

Comments

0
  1. DATE_FORMAT will not utilizing the indexes.

    1. You can still use the below query to utilize the index on timestamp column

      SELECT timestamp AS date, COUNT( DISTINCT ( email ) ) AS count FROM log WHERE timestamp > '2009-02-23 00:00:00' AND timestamp < '2020-01-01 23:59:59' AND TYPE = 'play' GROUP BY date ORDER BY date DESC

    2. Format the datetime value to date while printing/using

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.