0

So I have used MySQL a lot in small projects, for school; however, I'm not taking over a enterprise-ish scale project, and now speed matters, not just getting the right information back. I have Googled around a lot trying to learn how indexes might make my website faster, and I am hoping to further understand how they work, not just when to use them.

So, I find myself doing a lot of SELECT DISTINCTS in order to get all the distinct values, so i can populate my dropdowns. I have heard that this would be faster if this column was indexed; however, I don't completely understand why. If the values in this columns were ints, I would totally understand; basically a data structure like a BST would be created, and search times could be Log(n); however, if my column is strings, how can it put a string in a BST? This doesn't seem possible, since there is no metric to compare a string against another string (like there are with numbers). It seems like an index would just create a list of all the possible values for that column, but it seems as if the search would still require the database to go through every single row, making this search linear, just like if the database just scanned a regular tables.

My second question is what does the database do once it finds the right value in the index data structure. For example, let's say I'm doing a where age = 42. So, the database goes through the data structure until it finds 42, but how does it map that lookup to the whole row? Does the index have some sort of row number associated with it?

Lastly, if I am doing these frequent SELECT DISTINCT statements, is adding an index going to help? I feel like this must be a common task for websites, as many sites have dropdowns where you can filter results, I'm just trying to figure out if I'm approaching it the right way.

Thanks in advance.

4
  • 1
    Why isn't there a metric to compare a string against another? Certainly there is: the alphabetic order! That even makes sense in unicode. Commented Feb 10, 2014 at 22:52
  • I would like to make a suggestion: since you do not really believe that an index on a string type column makes sense, why don't you simply write a small test case to try things? You will be surprised! And maybe that helps to believe that indexes on strings do make sense. Commented Feb 10, 2014 at 22:54
  • agreed, try it. Indexes on strings can be good. Commented Feb 10, 2014 at 22:56
  • I did try the test case idea; however, I think some sort of caching kicked in because regardless of the index or not, it still got faster after the first query. Commented Feb 10, 2014 at 22:58

2 Answers 2

1

You logic is good, however, your assumption that there is no metric to compare string to other strings is incorrect. Strings can simply be compared in alphabetical order, giving them a perfectly usable comparison metric that can be used to build the index.

It takes a tiny bit longer to compare strings then it does ints, however, having an index still speeds things up, regardless of the comparison cost.

I would like to mention however that if you are using SELECT DISTINCT as much as you say, there are probably problems with your database schema.

You should learn about normalizing your database. I recommend starting with this link: http://databases.about.com/od/specificproducts/a/normalization.htm

Normalization will provide you with querying mechanism that can vastly outweigh benefits received from indexing.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much. This clears many things up. Unfortunately, it's a database schema I'm not really allowed to change that much, so indexing is about all I can do, but I appreciate you furthering my understanding of the topic.
1

if your strings are something small like categories, then an index will help. If you have large chunks of random text, then you will likely want a full text index. If you are having to use select distinct a lot, your database may not be properly normalized for what you are doing. You could also put the distinct values in a separate table (that only has the distinct values), but this only helps if the content does not change a lot. Indexing strategies are particular to your application's access patterns, the data itself, and how the tables are normalized (or not). HTH

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.