1

I've got a really big problem, and it stems from a table with 50k+ records.

This table looks something like this (+15 or so more columns that aren't too important):

table_1
date | name | email | num_x | num_y

I also have another table ON A DIFFERENT DB (same server) that looks something like this (+1 not important column):

table_2
name | comment | status

table_1 is updated daily with new entries (it is a feed table for use on other projects), which means there are a lot of repeat "name" rows. This is intended. table_2 contains comments and status notes about "name"s, but no repeat "name"s.

I need to write a query that will select all "name"s from table_1 where the total of all num_x + num_y > X. So, for example, if this were a few rows...

2010-11-19 | john.smith | [email protected] | 20 | 20  
2010-11-19 | joel.schmo | [email protected] | 10 | 10  
2010-11-18 | john.smith | [email protected] | 20 | 20  
2010-11-18 | joel.schmo | [email protected] | 10 | 10 

.. and I needed to find all "name"s with total num_x + num_y > 50, then I'd return john.smith | [email protected] | 80 . I would also return john.smith's status and comment from the other DB.

I wrote a query that I believe works fine, but it's problematic because it takes forever and a day to run. I also successfully retrieve records from the other db (I don't have that listed below).

SELECT        
    name,                                
    email,
    SUM(num_x + num_y) AS total
FROM
    table_1    
GROUP BY
   name
 HAVING
    SUM(num_x + num_y) > 100
ORDER BY
     total ASC

Is there a better way to go about this?

Thanks everyone!

Dylan

2
  • Do you have an index for num_x + num_y? Commented Nov 20, 2010 at 0:05
  • I don't, to be honest I didn't know that was an option. :) Commented Nov 20, 2010 at 1:46

4 Answers 4

1

Why do you repeat the sum in GHAVING rather than repeat total? Unless im missing something, there is no difference in results and avoiding the second sum would save time

If you can skip the ORDER BY clause and don't mind the slightly different select, I think you'll get some amount of speed up by splitting up the sum. I have a small database and have tested that its a valid query and results are correct, but its not nearly large enough to quantify the performance difference.

SELECT        
   name,                                
   email,
   SUM(num_x) as sumX, SUM(num_y) AS sumY
FROM
   table_1    
GROUP BY
   name
HAVING
   sumX + sumY > 100

An index on name is a no-brainer. That's the simplest thing that will speed it up.

Sign up to request clarification or add additional context in comments.

3 Comments

I took this advice and it did speed up the query by about 7%. Thank you very much for your help. I think I'm going to resign myself to the fact that it's a pretty rough call, and it's never going to be lightening fast. :)
which part of the advice? avoiding the sum in HAVING im confident would speed it up, not as sure about splitting up the sum
I ended up basically copying your query and it sped things up. Ultimately though, I reverted back to my old query because the Order By was important. However, I avoided the sum in HAVING, as you said, and I got pretty close to the same increase in speed. So you really helped me out.
1

Create an index for name, this will improve the performance:

ALTER TABLE `table_1` ADD INDEX (`name`); 

But, redesigning your databases would be my recomendation. Create an artificial key for names, something like id_name | name | email, beeing id_name an integer auto_increment, this way you'll have a better performance.

2 Comments

Adding an index for name is a good call. How does creating the auto_incremented integer help improve performance? Excuse my ignorance on this. Oh, by the way, thanks for your response!
It's much quicker to check for equality in integers than text - text has to check letter by letter, where each letter is represented as an integer. So checking "john.smith" is roughly the same as checking 10 integers, one per letter.
1

Try:

SELECT         
    name,                                 
    email, 
    num_x + num_y AS total 
FROM 
    table_1     
WHERE
    num_x + num_y > 100 
ORDER BY 
     total ASC 

Just getting rid of the grouping should make quite a significant difference.

3 Comments

That will give you duplicate name/email records -- one for every row that meets the criteria.
Oops didn't notice duplicates in the table.
Good call, but yeah, doesn't quite do it! Thanks though!
0

maybe change the database the sum is made everytime you change x or y but it really depends of how often you change them... Otherwise you can try to do the sum only once... but I don't see why you do a order by on only one table if you've got a primary key...

2 Comments

Hmm... I'll take this under consideration. I can only read from the DB at present, so I'll really need to sell this idea in order to get it done. Do you think it'd make a big difference? Also, thanks for responding!
Ho ok no I don't think it makes a big difference. removing the group by and adding an index on name would be sufficient have a much quicker query...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.