0

I am trying to optimize the following query as it is taking an extremely long time to execute. Can anyone provide any advice on how to optimize this and can they recommend any indexing that would speed it up. As a note the edata table contains around 1 million rows and the ddata table has around 15 million rows. There are around 5,000 items selected from ddata if you run the query

SELECT * FROM ddata WHERE DATE(startDate) = DATE(NOW());

The query that I am trying to optimize is:

SELECT e.ID,e.uID,e.sID
FROM edata e
LEFT JOIN ddata d ON e.sID=d.sID
WHERE DATE(d.startDate)=DATE(NOW());

Thanks

4
  • basic rule of thumb for indexes: any field used in a comparison operation should have an index on it. that's any thing used in your where, join, and sometimes order clauses. Note that having an index on a field isn't of any use if you're using values DERIVED from that field in the comparison, like you are with your DATE() calls. startDate might be indexed, but something like md5(somefield) will force a table scan. Commented Feb 9, 2014 at 18:39
  • Possibly as an aside, using NOW() means the query won't go into the query cache – if you prefilled that with a string, then repeated runs would be quicker. Commented Feb 9, 2014 at 18:39
  • I'd run EXPLAIN statement dev.mysql.com/doc/refman/5.0/en/explain.html to show you information about the query. Can you post the result of that so we can give a relevant answer. Commented Feb 9, 2014 at 18:40
  • Remove the LEFT JOIN, because the result is the same as an INNER JOIN. What are you trying to achieve? Commented Feb 9, 2014 at 18:43

2 Answers 2

3

#1: You probably don't want an Outer Join, so replace it with an Inner Join (MySQL's optimizer is known to be weak determining if an Outer Join can be rewritten as an Inner Join).

#2: Remove the function on d.startDate.

SELECT e.ID,e.uID,e.sID
FROM edata e
JOIN ddata d ON e.sID=d.sID
WHERE d.startDate >= DATE(NOW())
AND d.StartDate < date_add(DATE(NOW(), interval 1 days);
Sign up to request clarification or add additional context in comments.

Comments

0

Specifically for this query, put the where clause before the join, this will significantly reduce the execution time. Secondly, why use a LEFT OUTER JOIN when you're only selecting the columns of the left table? That defeats the purpose of the LEFT JOIN entirely. So a simple join would do.

SELECT e.ID,e.uID,e.sID
FROM edata e,
    (select * from ddata
          WHERE DATE(startDate)=DATE(NOW()
    ) d
WHERE e.sID=d.sID;

In general, use the EXPLAIN statement to understand and optimize your queries better. Also better if you go through basics of optimization in DBMS so that you can apply other techniques like indexing.

13 Comments

It does no make sense to join "d" because it is never used in select list.
But it is used in the join condition. And the nested query needs an alias. So it is needed.
"SELECT e.ID,e.uID,e.sID FROM edata e" should return the same result
And what about ON e.sID=d.sID; ??
It's a LEFT JOIN, which means all rows from the outer table are returned even if there's no match. And if no column from the inner table is used you can also simply remove the join. Most RDBMSes will remove the unneccessary join, but MySQL's optimizer is not that smart.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.