Generate non-fragmenting UUIDs in Postgres?

Question

If I understand correctly, fully-random UUID values create fragmented indexes. Or, more precisely, the lack of a common prefix prevents dense trie storage in the indexes.

I've seen a suggestion to use uuid_generate_v1() or uuid_generate_v1mc() instead of uuid_generate_v4() to avoid this problem.

However, it seems that Version 1 of the UUID spec has the low bits of the ID first, preventing a shared prefix. Also, this timestamp is 60 bits, which seems like it may be overkill.

By contrast, some databases provide non-standard UUID generators with a timestamp in the leading 32-bits and then 12 bytes of randomness. See Datomic's Squuid's for example 1, 2.

Does it in fact make sense to use "Squuids" like this in Postgres? If so, how can I generate such IDs efficiently with pgplsql?

As you insert or update more data, you might get the index fragmented, which means that your B+Tree, if you are using a normal index, gets less balanced. Of course you can reindex to make the tree more balanced. From your question, I assume that you want to see which UUID version gets the tree more balanced. I think you should run some benchmarks using pgbench to see if there is a difference in performance cost and if the plan is generated well. If any of the solutions work for your app then the rest is purely academic study. — andreim
– andreim, Commented Jun 21, 2017 at 6:41
prevents dense trie storage in the indexes: why assume trie storage? Typically you'd use a B-tree index for UUIDs. You would get trie storage only if asking for it, through the text_ops operator family of the SP-GiST type of index. — Daniel Vérité
– Daniel Vérité, Commented Jun 21, 2017 at 12:57
I answer a question like this here. Basically I suggest ULIDs. — DharmaTurtle
– DharmaTurtle, Commented Aug 22, 2020 at 11:49

Laurenz Albe · Accepted Answer · 2018-03-15 05:55:11Z

3

Note that inserting sequential index entries will result in a denser index only if you don't delete values and all your updates produce heap only tuples.

If you want sequential unique index values, why not build them yourself?

You could use clock_timestamp() in microseconds as bigint and append values from a cycling sequence:

CREATE SEQUENCE seq MINVALUE 0 MAXVALUE 999 CYCLE;

SELECT CAST(
          floor(
             EXTRACT(epoch FROM t)
          ) AS bigint
       ) % 1000000 * 1000000000
     + CAST(
          to_char(t, 'US') AS bigint
       ) * 1000
     + nextval('seq')
FROM (SELECT clock_timestamp()) clock(t);

edited Mar 15, 2018 at 5:55

answered Jun 21, 2017 at 11:29

Laurenz Albe

257k22 gold badges312 silver badges388 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Molomby Over a year ago

To avoid confusion for others -- clock_timestamp() returns a timestamp with time zone that has accuracy to the microsecond. It's not possible to get nanosecond accuracy in pgsql; this code just multiplies the microsecond value by 1000.

Laurenz Albe Over a year ago

Thanks, I'll fix the description.

Collectives™ on Stack Overflow

Generate non-fragmenting UUIDs in Postgres?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related