0

What are the benefits of having an index and a unique constraint on a primary key field that is a UUID? It seems that if I have 25M records and I have to insert a new record, all 25M records have to be searched and checked that they don't have the same UUID with or without an index. Finding that record is also troublesome as uuid's can't be sorted. Am I missing something?

2
  • @zerkms Ah yeah I was referring to the fact that the pk already has an index on it (since it is a pk - using Django fyi). I didn't create one myself. On the second point, where I was trying to learn was HOW does an index speedup select and where clauses when the index is on a field that is a random UUID (uuid4)? If i need to find something where pk=829278b0-9222-403a-85d0-b2bc73371551, I am having trouble understanding where locating this record results in less read operations with an index, especially since there is no order to them. Commented Apr 4, 2016 at 22:36
  • 1
    maybe interesting? What is a B-tree (index). Also: use-the-index-luke.com/sql/anatomy/the-tree Commented Apr 4, 2016 at 22:43

1 Answer 1

5

Yes.

  1. UUID values can be sorted. They might not be sorted in a sequence that you find particularly desirable. But UUIDs are data values. They can be compared (are they equal, is one less than another), and therefore they can be sorted.

  2. Declaring a PRIMARY KEY effectively creates a UNIQUE index. With some storage engines (e.g. InnoDB) the PRIMARY KEY is the cluster key of the table. With other storage engines (e.g. MyISAM), the table is stored as a heap and the PRIMARY KEY is essentially the same as declaring NOT NULL constraints and adding a UNIQUE INDEX.

  3. Yes, when inserting a row into a table, the storage engine must ensure that no PRIMARY KEY or UNIQUE KEY constraints are violated... the values on the new row being inserted don't duplicate values that are already stored.

And that is equivalent (theoretically) to a check of all 25M rows. But because there's an available index structure, the storage engine doesn't need to check all of the individual rows. It uses the index instead.

Because the index is stored "in order", there are vast swaths of blocks containing rows that don't need to be checked. They don't need to be checked because it's impossible for a row with those particular values of the key columns to be stored in those blocks. The storage engine very efficiently identifies the one block where a row with a "duplicate" key value exists, or would exist.


FOLLOWUP

The answer above mostly refers to MySQL (one of the tags in the question). In terms of PostgreSQL, I believe the points are valid.

As far as using UUID values as the PRIMARY KEY for table, there can be some performance drawbacks as compared to some other choices. Two primary concerns: the space required for storing a UUID, and UUID values aren't generated/inserted sequentially.

A UUID is 128 bits, which is 16 bytes. But that's often converted to "human readable form (?)" of 36 characters. Storing the UUID as 36 characters takes up a lot more room than a simple integer. When a CHAR(36) is used as the PRIMARY KEY, that doesn't get stored in just the primary key index, but also gets stored as the "row pointer" in all of the secondary indexes. That translates into fewer keys per block, which in turn means more blocks in the indexes.

The other issue, with new values being inserted not just at the back of the index, but all over in the index, leads to block splits and fragmentation. We don't have to be overly concerned about all that, because the database handles it for us. But using a UUID as a PRIMARY KEY can translate into measurably "slower performance" (vs using ascending integer values), at least in the test lab.


In terms of "what's the benefit" of adding a secondary index with the PRIMARY KEY as the leading column. In general, there is no benefit.

(I'm not going to rule out the corner cases where having such an index might be beneficial. I'd expect those corner case to involve very long rows in an index organized table, and some specific SQL statements that could make efficient use of a secondary index. But that performance benefit would come at a cost, additional blocks (memory and disk i/o) and the extra work to maintain the secondary index.)

Sign up to request clarification or add additional context in comments.

2 Comments

It's worth stressing that a UNIQUE INDEX or PRIMARY KEY type of index provide many benefits, like fast retrieval and duplicate checking.
@tadman: I concur. The same can be true for non-unique indexes... as long as those are on tuples with suitable cardinality. Some databases (e.g. Oracle) can make use of a non-unique index to enforce a UNIQUE constraint.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.