0

I got this query:

SELECT user_id  
FROM basic_info  
WHERE age BETWEEN 18 AND 22 AND gender = 0  
ORDER BY rating  
LIMIT 50  

The table looks like (and it contains about 700k rows):

CREATE TABLE IF NOT EXISTS `basic_info` (  
  `user_id` mediumint(8) unsigned NOT NULL auto_increment,  
  `gender` tinyint(1) unsigned NOT NULL default '0',  
  `age` tinyint(2) unsigned NOT NULL default '0',  
  `rating` smallint(5) unsigned NOT NULL default '0',  
  PRIMARY KEY  (`user_id`),  
  KEY `tmp` (`gender`,`rating`),  
) ENGINE=MyISAM;  

The query itself is optimized but it has to walk about 200k rows to do his job. Here's the explain output:

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra  
1   SIMPLE  basic_info  ref     tmp,age     tmp     1   const   200451  Using where  

Is it possible to optimize the query so it won't walk over 200k rows ?

Thanks !

5
  • how many rows would you get if you omitted the LIMIT? Commented Nov 17, 2009 at 9:36
  • @Dave K: if i ommit the LIMIT it'll output ~ 63k rows Commented Nov 17, 2009 at 9:41
  • How much time does the query currently take and how much do you need to optimize it? Is this your own server, or are you using a virtual hosting service? Commented Nov 17, 2009 at 11:13
  • @intgr: the query currently take ~ 0.0009 sec. the database resides in my own server. as i said earlier the query is optimized well , the only task i should solve is to make the query to walk less rows than it does now Commented Nov 17, 2009 at 11:45
  • Whoa, this is definitely premature optimization. A query that takes 1ms is certainly not going to be the bottleneck in ANY web application. Commented Nov 17, 2009 at 17:47

3 Answers 3

7

There are two useful indexes that can help this query:

KEY gender_age (gender, age) -- this index can satisfy both the gender=0 condition as well as age BETWEEN 18 AND 22. However, because you have a range condition over the age field, adding the rating column to the index will not give sorted results -- hence MySQL will select all matching rows -- ignoring your LIMIT clause -- and do an additional filesort regardless.

KEY gender_rating (gender, rating) -- the index you already have; this index can satisfy the gender=0 condition and retrieves data already sorted by rating. However, the database has to scan all elements with gender=0 and eliminate those who are not in range age BETWEEN 18 AND 22

Changing schema

If the above does not help you enough, changing your schema is always possible. One such approach is turning the age BETWEEN condition into an equality condition, by defining an age group column; for instance, ages 0-12 will be in age group 1, ages 12-18 in age group 2, etc.

This way, having an index with (gender, agegroup, rating) and query with WHERE gender=0 AND agegroup=3 ORDER BY rating will retrieve all results from the index and already sorted. In this case, the LIMIT clause should only fetch 50 entries from the table and no more.

Sign up to request clarification or add additional context in comments.

4 Comments

+1 for the explanation, especially about the range condition.
@intgr: changing the schema sounds reasonable but it's not possible in my case because : what happen if the user says - give me all the users between 10 AND 12, or 11 AND 20 or even 10 AND 40?
If you want to allow queries over arbitrary age ranges then you're right, it does not help. Unfortunately I'm out of ideas for now.
@intgr: anyway i +1 because of the good advices + explanations. thanks!
1

Extend you tmp-key to include the age-column:

KEY `tmp` (`age`,`gender`,`rating`)

3 Comments

The rating column is useless in this index. Since the query has a range condition on the gender field, results from this index will not be sorted, so ther will be a separate sort step no matter what.
If i extend the key as you propose the query performance degrade. Here's the explain output: id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE basic_info range tmp tmp 2 NULL 107375 Using where; Using filesort
@intgr : The rating column in the index save me from the using filesort
1

Attempt to use InnoDB to improve performence?

Benchmarking here

1 Comment

@Shadi Almosri: yep, it's better, but it's "forbidden" to use INNODB :(

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.