0

Let's assume i have this MySQL database under the name records. Table scheme would be as follows, where id is an index key and url is unique:

id BINGINT(20) UNSIGNED AUTO_INCREMENT
num_chars SMALLINT(4) UNSIGNED
url VARCHAR(1000) UNIQUE

This would be the table's data representation, basicaly:

-------------------------------------------
| id | num_chars |         url            |
-------------------------------------------
|  1 |    22     | https://www.google.com |
|  2 |    17     | https://yahoo.com      |
|  3 |    16     | https://bing.com       |
-------------------------------------------

num_chars is the url's number of characters.

My question is, considering the fact that this table will probably hit several millions of records: is there a performance improvement of this query:

SELECT * FROM records WHERE num_chars = 17 AND url = 'https://yahoo.com';

Over this one:

SELECT * FROM records WHERE url = 'https://yahoo.com';

I know that integer based queries are more efficient than string based ones (correct me if i'm wrong), therefore i wonder if filtering by num_chars before url would represent a efficiency improvement.

By the way, the advantage in this case is that i can easily calculate num_chars from url before performing the MySQL query, using PHP, Java, Python, etc.

7
  • Why not insert several million dummy records and test it out? It takes 5 minutes. Commented Mar 25, 2020 at 17:50
  • 1
    Why store num_chars? Will easily end up inconsistent. Commented Mar 25, 2020 at 17:55
  • @t1f thanks for your comment. It took me more than 10 minites to write this question, so no, it's not a matter of time. This question might also help others to get enlightened. If someone with the required knowledge can answer this question, or at least legitly mark it as duplicated, that would be wonderful! Commented Mar 25, 2020 at 17:55
  • @jarlh i'm not sure where the inconsistency would take place, if you can explain that, please. Commented Mar 25, 2020 at 17:57
  • Some updates url, but forgets about num_chars. Classic error. Commented Mar 25, 2020 at 18:06

3 Answers 3

1

You have a unique index on url. So, both queries will use this index.

Adding an additional check on the length is not going to speed up the query. There will be a very, very, very small additional overhead for the length check, but that is immaterial.

When you have a unique index, there is no need to add additional checks.

Note: The advantage of an integer comparison over a string comparison arises when you don't need to do a string comparison. In this case, you need to do the string comparison.

There might be tiny gain if you hashed the string to an integer and compared that before comparing the string.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for your answer. I don't really know how MySQL engine works from the depths so i thought that checking num_chars (integer) before url (string) would make the query faster. I was considering it as a num_chars pre-filtering before the url filtering, if i'm not wrong in using the word "filter" in this case.
@EduardoEscobar . . . You get no gain by filtering before using a unique index.
0

Without an appropriate index defined, both of those queries are going to suck.

It's not actually true that integer queries are more efficient than text based ones; we can demonstrate text based queries that are blazing fast, and integer queries that are glacial. (At least, its not true enough in this case to make any difference.)

What matters, what does make a difference for large sets is effective use of an available index.


With several millions of rows, we need to consider the distribution of the num_chars values, for outliers, where there are only a couple dozen rows, and index search on num_chars will be fast. But for larger sets, we still need to evaluate the url to see if it matches.


I'd just create a covering index for the query:

CREATE UNIQUE INDEX mytable_ix1 ON mytable (url, num_chars, id) ;

Then run whichever query you want; we expect same execution plan, so performance will be the same.

3 Comments

Thanks for your answer. Well, the described table has 2 indexes, of course, url (unique) is the one that matters in this case. Anyway, you have mentioned something that leads to an important point: WHERE conditions order. I'll investigate about that right away.
i missed the unique index on the url column. the order of the predicates (condiitons) in the WHERE clause does not matter to the optimizer. use EXPLAIN to see the execution plan.
Sorry, it was a typo, my bad.
0

Is there a performance improvement?

The answer depends on two tings:

  1. The selectivity of the num_chars column. If a lot of your data comes from a few different sources: things like url shorteners, amazon product links, etc — really any system where you have a relatively small number possible lengths — then adding that num_chars=17 condition is still going to match a lot of rows and not actually filter things down much.
  2. The index choices made for the table. An index on url directly, with no other indexes, is likely to make that condition outperform the num_chars condition regardless of selectivity. However, placing both num_chars and url into a single index, in that order, might be able to take good advantage of the additional field, even with poor selectivity.

But remember: database vendors aren't stupid. They devote a lot of effort into finding ways to optimize queries. There's good odds the engine may already be doing this kind of thing behind the scenes. The best thing you could do is generate some sample data in a table and test it, to know what will really happen.

Finally, if you really want to do this, consider making it a Generated Column.

2 Comments

Thanks, i will keep that in mind. At the end, it seems that i will need to perform some tests by myself.
The selectivity of the components of a composite index does not matter! Think of the BTree as being the concatenation of the two columns.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.