2

I've taken a zip code database and pre-calculated the distances to other zip codes within a certain radius. The database itself is about 2.5GB so not anything extraordinary.

The goal of doing this is to be able to:

select * from zipcode_distances where zipcode_from=92101 and distance < 10;

So far the only index i've defined is:

(zipcode_from, distance)

However, running the query takes about 20 seconds to get the results.

When I remove the "and distance < 10" clause, the results are instantaneous.

Any advice would be appreciated.

Edit:

Here is the create statement:

delimiter $$

CREATE TABLE `zipcode_distances` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `zipcode_from` char(5) COLLATE utf8_bin NOT NULL,
  `zipcode_to` char(5) COLLATE utf8_bin NOT NULL,
  `distance` double unsigned NOT NULL COMMENT 'stored in miles',
  PRIMARY KEY (`id`),
  KEY `idx_zip_from_distance` (`zipcode_from`,`distance`)
) ENGINE=MyISAM AUTO_INCREMENT=62548721 DEFAULT CHARSET=utf8 COLLATE=utf8_bin$$

Here is the explain:

explain extended select * from zipcode_distances where zipcode_from=90210 and distance < 10;

Results:

id, select_type, table, possible_keys, key, key_len, ref, rows, filtered, Extra 1, SIMPLE, zipcode_distances, ALL, idx_zip_from_distance, null, null, null, 62548720, 100.00, Using where

Thank you!

5
  • please paste the explain plan and the table construct Commented Dec 30, 2012 at 18:09
  • Please show the table layout because at this time it is hard to help unless we see how your store the data for distances to zip code. Commented Dec 30, 2012 at 18:11
  • Try SELECT some_column FROM ... for a start. Commented Dec 30, 2012 at 18:12
  • added in the explain and create table Commented Dec 30, 2012 at 18:17
  • Did you run both queries multiple times? The first time you run, caches are filled. Subsequent runs should be faster. You may simply have preloaded the cache with the first query, and then used it for the second one. Commented Dec 30, 2012 at 18:26

1 Answer 1

3

I see no problem with MySQL using the index for the query. I do wonder if the type conversion from 92101 could be confusing it.

Do you get the same poor performance with this?

select * from zipcode_distances where zipcode_from='92101' and distance < 10;

The other issue is how you are doing the timings. You have to run multiple times to avoid the effects of filling the cache.

Sign up to request clarification or add additional context in comments.

2 Comments

this is correct; the type conversion was the culprit! thank you!!
Just a hint: please when you do a explain extended, paste as well the warnings. Usually in the warning, you will have the aproximate query that will get executed, along with a conversion function for your case. These types of errors are very hard to track initially, because all seems normal. One thing that should raise questions is the fact that it was a complete table scan (one of the worse types of scans that can exist). Next you need to enable warnings, and fix the conversion issue.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.