0

I was wondering if i could optimize it more, maybe someone struggled with that.

First of all I have table:

CREATE TABLE `site_url` (
    `id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
    `url_hash` CHAR(32) NULL DEFAULT NULL,
    `url` VARCHAR(2048) NULL DEFAULT NULL,
    PRIMARY KEY (`id`),
    INDEX `url_hash` (`url_hash`)
)
ENGINE=InnoDB;

where I store site URI (domain is in different table, but for purpose of this question id doesn't matter - I hope)

url_hash is MD5 calculated from url

It seems that all fields are in good length, indexes should be correct but there are a lat of data in it and I'm looking for more optimization.

Standard query looks like this:

select id from site_url where site_url.url_hash = MD5('something - often calculated in application rather than in mysql') and site_url.url = 'something - often calculated in application rather than in mysql'

describe gives:

+----+-------------+----------+------+---------------+----------+---------+-------+------+------------------------------------+
| id | select_type |  table   | type | possible_keys |   key    | key_len |  ref  | rows |               Extra                |
+----+-------------+----------+------+---------------+----------+---------+-------+------+------------------------------------+
|  1 | SIMPLE      | site_url | ref  | url_hash      | url_hash |      97 | const |    1 | Using index condition; Using where |
+----+-------------+----------+------+---------------+----------+---------+-------+------+------------------------------------+

But I'm wondering if I could help mysql doing that search. It must by InnoDB engine, I can't add key to url because of it's length

Friend of mine told me to short up hash to 16 chars, and write it as number. Will index on BIGINT be faster than on char(32)? Friend also suggested to do MD5 and take 16 first/last chars of it but I think it will make a lot more collisions.

What are your thoughts about it?

3
  • 1
    You can shorten the url_hash to binary(16). An integer won't be large enough to store the hash as number. That should give you some more space. Also, optimizing MySQL will help immensely. Look up your innodb_buffer_pool_size variable and google around to see what people are doing with it to turbocharge MySQL performance. Commented Oct 10, 2014 at 8:10
  • That is great idea, with less complications to handle in refactorization process. Just need to change 2 queries and field: INSERTinsert into site_url (url_hash,url) values (UNHEX(md5('/uri')),'/uri'); and SELECT: SELECT id FROM site_url USE INDEX (url_hash) WHERE url_hash = UNHEX(md5('/uri')) AND url = '/uri'; Commented Oct 10, 2014 at 8:46
  • Exactly, also having you know to use unhex in MySQL is a breath of fresh air to be honest :) don't forget to optimize InnoDB if you haven't already . Commented Oct 10, 2014 at 9:15

2 Answers 2

1

This is your query:

select id
from site_url
where site_url.url_hash = MD5('something - often calculated in application rather than in mysql') and
      site_url.url = 'something - often calculated in application rather than in mysql';

The best index for this query would be on site_url(url_hash, url, id). The caveat is that you might need to use a prefix unless you have the large prefix option set (see innodb_large_prefix).

Sign up to request clarification or add additional context in comments.

1 Comment

thank's for reply currently i'm not sure if i can change innodb_large_prefix variable, Your index seems to be best until You can create such long index
0

If url_hash is md5 of url why you select by 2 keys?

select id from site_url where site_url.url_hash = MD5('something - often calculated in application rather than in mysql');

Actually you dont need seсond check of site_url.url;

But if you want, you can select by 2 fields with USE INDEX syntax:

select id from site_url USE INDEX (url_hash) where site_url.url_hash = MD5('something - often calculated in application rather than in mysql') and site_url.url = 'something - often calculated in application rather than in mysql');

7 Comments

I use 2 fields in where to be sure i don't have collision in md5 hash (same hash but different url)
Hmm. If you are not sure about md5 use sha2 with 256 characters.
I have a similar project with urls and so on. md5 and sha2 work fine for me, without collisions.
I want to use rather smaller key, than longer one
Ok. Then you can select by 2 fields, but with USE INDEX (url_hash)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.