@@ -780,20 +780,21 @@ order. Delaying deduplication minimizes page level fragmentation.
780780Deduplication in unique indexes
781781-------------------------------
782782
783- Very often, the range of values that can be placed on a given leaf page in
784- a unique index is fixed and permanent. For example, a primary key on an
785- identity column will usually only have page splits caused by the insertion
786- of new logical rows within the rightmost leaf page. If there is a split
787- of a non-rightmost leaf page, then the split must have been triggered by
788- inserts associated with an UPDATE of an existing logical row. Splitting a
789- leaf page purely to store multiple versions should be considered
790- pathological, since it permanently degrades the index structure in order
791- to absorb a temporary burst of duplicates. Deduplication in unique
792- indexes helps to prevent these pathological page splits. Storing
793- duplicates in a space efficient manner is not the goal, since in the long
794- run there won't be any duplicates anyway. Rather, we're buying time for
795- standard garbage collection mechanisms to run before a page split is
796- needed.
783+ Very often, the number of distinct values that can ever be placed on
784+ almost any given leaf page in a unique index is fixed and permanent. For
785+ example, a primary key on an identity column will usually only have leaf
786+ page splits caused by the insertion of new logical rows within the
787+ rightmost leaf page. If there is a split of a non-rightmost leaf page,
788+ then the split must have been triggered by inserts associated with UPDATEs
789+ of existing logical rows. Splitting a leaf page purely to store multiple
790+ versions is a false economy. In effect, we're permanently degrading the
791+ index structure just to absorb a temporary burst of duplicates.
792+
793+ Deduplication in unique indexes helps to prevent these pathological page
794+ splits. Storing duplicates in a space efficient manner is not the goal,
795+ since in the long run there won't be any duplicates anyway. Rather, we're
796+ buying time for standard garbage collection mechanisms to run before a
797+ page split is needed.
797798
798799Unique index leaf pages only get a deduplication pass when an insertion
799800(that might have to split the page) observed an existing duplicate on the
@@ -838,13 +839,15 @@ list splits.
838839
839840Only a few isolated extra steps are required to preserve the illusion that
840841the new item never overlapped with an existing posting list in the first
841- place: the heap TID of the incoming tuple is swapped with the rightmost/max
842- heap TID from the existing/originally overlapping posting list. Also, the
843- posting-split-with-page-split case must generate a new high key based on
844- an imaginary version of the original page that has both the final new item
845- and the after-list-split posting tuple (page splits usually just operate
846- against an imaginary version that contains the new item/item that won't
847- fit).
842+ place: the heap TID of the incoming tuple has its TID replaced with the
843+ rightmost/max heap TID from the existing/originally overlapping posting
844+ list. Similarly, the original incoming item's TID is relocated to the
845+ appropriate offset in the posting list (we usually shift TIDs out of the
846+ way to make a hole for it). Finally, the posting-split-with-page-split
847+ case must generate a new high key based on an imaginary version of the
848+ original page that has both the final new item and the after-list-split
849+ posting tuple (page splits usually just operate against an imaginary
850+ version that contains the new item/item that won't fit).
848851
849852This approach avoids inventing an "eager" atomic posting split operation
850853that splits the posting list without simultaneously finishing the insert
0 commit comments