@@ -586,6 +586,69 @@ The caller can then send a cancellation signal. This implements the
586586principle that autovacuum has a low locking priority (eg it must not block
587587DDL on the table).
588588
589+ Group Locking
590+ -------------
591+
592+ As if all of that weren't already complicated enough, PostgreSQL now supports
593+ parallelism (see src/backend/access/transam/README.parallel), which means that
594+ we might need to resolve deadlocks that occur between gangs of related processes
595+ rather than individual processes. This doesn't change the basic deadlock
596+ detection algorithm very much, but it makes the bookkeeping more complicated.
597+
598+ We choose to regard locks held by processes in the same parallel group as
599+ non-conflicting. This means that two processes in a parallel group can hold
600+ a self-exclusive lock on the same relation at the same time, or one process
601+ can acquire an AccessShareLock while the other already holds AccessExclusiveLock.
602+ This might seem dangerous and could be in some cases (more on that below), but
603+ if we didn't do this then parallel query would be extremely prone to
604+ self-deadlock. For example, a parallel query against a relation on which the
605+ leader had already AccessExclusiveLock would hang, because the workers would
606+ try to lock the same relation and be blocked by the leader; yet the leader can't
607+ finish until it receives completion indications from all workers. An undetected
608+ deadlock results. This is far from the only scenario where such a problem
609+ happens. The same thing will occur if the leader holds only AccessShareLock,
610+ the worker seeks AccessShareLock, but between the time the leader attempts to
611+ acquire the lock and the time the worker attempts to acquire it, some other
612+ process queues up waiting for an AccessExclusiveLock. In this case, too, an
613+ indefinite hang results.
614+
615+ It might seem that we could predict which locks the workers will attempt to
616+ acquire and ensure before going parallel that those locks would be acquired
617+ successfully. But this is very difficult to make work in a general way. For
618+ example, a parallel worker's portion of the query plan could involve an
619+ SQL-callable function which generates a query dynamically, and that query
620+ might happen to hit a table on which the leader happens to hold
621+ AccessExcusiveLock. By imposing enough restrictions on what workers can do,
622+ we could eventually create a situation where their behavior can be adequately
623+ restricted, but these restrictions would be fairly onerous, and even then, the
624+ system required to decide whether the workers will succeed at acquiring the
625+ necessary locks would be complex and possibly buggy.
626+
627+ So, instead, we take the approach of deciding that locks within a lock group
628+ do not conflict. This eliminates the possibility of an undetected deadlock,
629+ but also opens up some problem cases: if the leader and worker try to do some
630+ operation at the same time which would ordinarily be prevented by the heavyweight
631+ lock mechanism, undefined behavior might result. In practice, the dangers are
632+ modest. The leader and worker share the same transaction, snapshot, and combo
633+ CID hash, and neither can perform any DDL or, indeed, write any data at all.
634+ Thus, for either to read a table locked exclusively by the other is safe enough.
635+ Problems would occur if the leader initiated parallelism from a point in the
636+ code at which it had some backend-private state that made table access from
637+ another process unsafe, for example after calling SetReindexProcessing and
638+ before calling ResetReindexProcessing, catastrophe could ensue, because the
639+ worker won't have that state. Similarly, problems could occur with certain
640+ kinds of non-relation locks, such as relation extension locks. It's no safer
641+ for two related processes to extend the same relation at the time than for
642+ unrelated processes to do the same. However, since parallel mode is strictly
643+ read-only at present, neither this nor most of the similar cases can arise at
644+ present. To allow parallel writes, we'll either need to (1) further enhance
645+ the deadlock detector to handle those types of locks in a different way than
646+ other types; or (2) have parallel workers use some other mutual exclusion
647+ method for such cases; or (3) revise those cases so that they no longer use
648+ heavyweight locking in the first place (which is not a crazy idea, given that
649+ such lock acquisitions are not expected to deadlock and that heavyweight lock
650+ acquisition is fairly slow anyway).
651+
589652User Locks (Advisory Locks)
590653---------------------------
591654
0 commit comments