6

If I understand correctly, fully-random UUID values create fragmented indexes. Or, more precisely, the lack of a common prefix prevents dense trie storage in the indexes.

I've seen a suggestion to use uuid_generate_v1() or uuid_generate_v1mc() instead of uuid_generate_v4() to avoid this problem.

However, it seems that Version 1 of the UUID spec has the low bits of the ID first, preventing a shared prefix. Also, this timestamp is 60 bits, which seems like it may be overkill.

By contrast, some databases provide non-standard UUID generators with a timestamp in the leading 32-bits and then 12 bytes of randomness. See Datomic's Squuid's for example 1, 2.

Does it in fact make sense to use "Squuids" like this in Postgres? If so, how can I generate such IDs efficiently with pgplsql?

3
  • As you insert or update more data, you might get the index fragmented, which means that your B+Tree, if you are using a normal index, gets less balanced. Of course you can reindex to make the tree more balanced. From your question, I assume that you want to see which UUID version gets the tree more balanced. I think you should run some benchmarks using pgbench to see if there is a difference in performance cost and if the plan is generated well. If any of the solutions work for your app then the rest is purely academic study. Commented Jun 21, 2017 at 6:41
  • prevents dense trie storage in the indexes: why assume trie storage? Typically you'd use a B-tree index for UUIDs. You would get trie storage only if asking for it, through the text_ops operator family of the SP-GiST type of index. Commented Jun 21, 2017 at 12:57
  • I answer a question like this here. Basically I suggest ULIDs. Commented Aug 22, 2020 at 11:49

1 Answer 1

3

Note that inserting sequential index entries will result in a denser index only if you don't delete values and all your updates produce heap only tuples.

If you want sequential unique index values, why not build them yourself?

You could use clock_timestamp() in microseconds as bigint and append values from a cycling sequence:

CREATE SEQUENCE seq MINVALUE 0 MAXVALUE 999 CYCLE;

SELECT CAST(
          floor(
             EXTRACT(epoch FROM t)
          ) AS bigint
       ) % 1000000 * 1000000000
     + CAST(
          to_char(t, 'US') AS bigint
       ) * 1000
     + nextval('seq')
FROM (SELECT clock_timestamp()) clock(t);
Sign up to request clarification or add additional context in comments.

2 Comments

To avoid confusion for others -- clock_timestamp() returns a timestamp with time zone that has accuracy to the microsecond. It's not possible to get nanosecond accuracy in pgsql; this code just multiplies the microsecond value by 1000.
Thanks, I'll fix the description.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.