@@ -228,14 +228,9 @@ a specialized hash function (see proclock_hash).
228228* Formerly, each PGPROC had a single list of PROCLOCKs belonging to it.
229229This has now been split into per-partition lists, so that access to a
230230particular PROCLOCK list can be protected by the associated partition's
231- LWLock. (This is not strictly necessary at the moment, because at this
232- writing a PGPROC's PROCLOCK list is only accessed by the owning backend
233- anyway. But it seems forward-looking to maintain a convention for how
234- other backends could access it. In any case LockReleaseAll needs to be
235- able to quickly determine which partition each LOCK belongs to, and
236- for the currently contemplated number of partitions, this way takes less
237- shared memory than explicitly storing a partition number in LOCK structs
238- would require.)
231+ LWLock. (This rule allows one backend to manipulate another backend's
232+ PROCLOCK lists, which was not originally necessary but is now required in
233+ connection with fast-path locking; see below.)
239234
240235* The other lock-related fields of a PGPROC are only interesting when
241236the PGPROC is waiting for a lock, so we consider that they are protected
@@ -292,20 +287,20 @@ To alleviate this bottleneck, beginning in PostgreSQL 9.2, each backend is
292287permitted to record a limited number of locks on unshared relations in an
293288array within its PGPROC structure, rather than using the primary lock table.
294289This mechanism can only be used when the locker can verify that no conflicting
295- locks can possibly exist .
290+ locks exist at the time of taking the lock .
296291
297292A key point of this algorithm is that it must be possible to verify the
298293absence of possibly conflicting locks without fighting over a shared LWLock or
299294spinlock. Otherwise, this effort would simply move the contention bottleneck
300295from one place to another. We accomplish this using an array of 1024 integer
301- counters, which are in effect a 1024-way partitioning of the lock space. Each
302- counter records the number of "strong" locks (that is, ShareLock,
296+ counters, which are in effect a 1024-way partitioning of the lock space.
297+ Each counter records the number of "strong" locks (that is, ShareLock,
303298ShareRowExclusiveLock, ExclusiveLock, and AccessExclusiveLock) on unshared
304299relations that fall into that partition. When this counter is non-zero, the
305- fast path mechanism may not be used for relation locks in that partition. A
306- strong locker bumps the counter and then scans each per-backend array for
307- matching fast-path locks; any which are found must be transferred to the
308- primary lock table before attempting to acquire the lock, to ensure proper
300+ fast path mechanism may not be used to take new relation locks within that
301+ partition. A strong locker bumps the counter and then scans each per-backend
302+ array for matching fast-path locks; any which are found must be transferred to
303+ the primary lock table before attempting to acquire the lock, to ensure proper
309304lock conflict and deadlock detection.
310305
311306On an SMP system, we must guarantee proper memory synchronization. Here we
@@ -314,19 +309,19 @@ A performs a store, A and B both acquire an LWLock in either order, and B
314309then performs a load on the same memory location, it is guaranteed to see
315310A's store. In this case, each backend's fast-path lock queue is protected
316311by an LWLock. A backend wishing to acquire a fast-path lock grabs this
317- LWLock before examining FastPathStrongRelationLocks to check for the presence of
318- a conflicting strong lock. And the backend attempting to acquire a strong
312+ LWLock before examining FastPathStrongRelationLocks to check for the presence
313+ of a conflicting strong lock. And the backend attempting to acquire a strong
319314lock, because it must transfer any matching weak locks taken via the fast-path
320- mechanism to the shared lock table, will acquire every LWLock protecting
321- a backend fast-path queue in turn. So, if we examine FastPathStrongRelationLocks
322- and see a zero, then either the value is truly zero, or if it is a stale value,
323- the strong locker has yet to acquire the per-backend LWLock we now hold (or,
324- indeed, even the first per-backend LWLock) and will notice any weak lock we
325- take when it does.
315+ mechanism to the shared lock table, will acquire every LWLock protecting a
316+ backend fast-path queue in turn. So, if we examine
317+ FastPathStrongRelationLocks and see a zero, then either the value is truly
318+ zero, or if it is a stale value, the strong locker has yet to acquire the
319+ per-backend LWLock we now hold (or, indeed, even the first per-backend LWLock)
320+ and will notice any weak lock we take when it does.
326321
327322Fast-path VXID locks do not use the FastPathStrongRelationLocks table. The
328- first lock taken on a VXID is always the ExclusiveLock taken by its owner. Any
329- subsequent lockers are share lockers waiting for the VXID to terminate.
323+ first lock taken on a VXID is always the ExclusiveLock taken by its owner.
324+ Any subsequent lockers are share lockers waiting for the VXID to terminate.
330325Indeed, the only reason VXID locks use the lock manager at all (rather than
331326waiting for the VXID to terminate via some other method) is for deadlock
332327detection. Thus, the initial VXID lock can *always* be taken via the fast
@@ -335,6 +330,10 @@ whether the lock has been transferred to the main lock table, and if not,
335330do so. The backend owning the VXID must be careful to clean up any entry
336331made in the main lock table at end of transaction.
337332
333+ Deadlock detection does not need to examine the fast-path data structures,
334+ because any lock that could possibly be involved in a deadlock must have
335+ been transferred to the main tables beforehand.
336+
338337
339338The Deadlock Detection Algorithm
340339--------------------------------
@@ -376,7 +375,7 @@ inserted in the wait queue just ahead of the first such waiter. (If we
376375did not make this check, the deadlock detection code would adjust the
377376queue order to resolve the conflict, but it's relatively cheap to make
378377the check in ProcSleep and avoid a deadlock timeout delay in this case.)
379- Note special case when inserting before the end of the queue: if the
378+ Note special case when inserting before the end of the queue: if the
380379process's request does not conflict with any existing lock nor any
381380waiting request before its insertion point, then go ahead and grant the
382381lock without waiting.
@@ -414,7 +413,7 @@ need to kill all the transactions involved.
414413indicates a deadlock, but one that does not involve our starting
415414process. We ignore this condition on the grounds that resolving such a
416415deadlock is the responsibility of the processes involved --- killing our
417- start- point process would not resolve the deadlock. So, cases 1 and 3
416+ start-point process would not resolve the deadlock. So, cases 1 and 3
418417both report "no deadlock".
419418
420419Postgres' situation is a little more complex than the standard discussion
@@ -620,7 +619,7 @@ level is AccessExclusiveLock.
620619Regular backends are only allowed to take locks on relations or objects
621620at RowExclusiveLock or lower. This ensures that they do not conflict with
622621each other or with the Startup process, unless AccessExclusiveLocks are
623- requested by one of the backends .
622+ requested by the Startup process .
624623
625624Deadlocks involving AccessExclusiveLocks are not possible, so we need
626625not be concerned that a user initiated deadlock can prevent recovery from
@@ -632,3 +631,9 @@ of transaction just as they are in normal processing. These locks are
632631held by the Startup process, acting as a proxy for the backends that
633632originally acquired these locks. Again, these locks cannot conflict with
634633one another, so the Startup process cannot deadlock itself either.
634+
635+ Although deadlock is not possible, a regular backend's weak lock can
636+ prevent the Startup process from making progress in applying WAL, which is
637+ usually not something that should be tolerated for very long. Mechanisms
638+ exist to forcibly cancel a regular backend's query if it blocks the
639+ Startup process for too long.
0 commit comments