ACCESS SHARE does not jump the queue: Postgres lock fairness
A SELECT arriving mid-ALTER TABLE waits, even though SHARE-SHARE is compatible. Postgres prioritizes queue fairness over lock-mode compatibility. A close read of the source that enforces it.
Every senior Postgres engineer eventually runs into the same surprise. A long-running SELECT holds ACCESS SHARE on a busy table. An ALTER TABLE arrives and is blocked by the SELECT, which is expected. Then every subsequent SELECT on that table is also blocked, including queries that hold nothing, wait on nothing, and whose lock mode is fully compatible with the one the first SELECT holds. The rule that produces this behavior is not documented as a feature in the user-facing chapter on locking. It is a consequence of the way LockAcquireExtended decides whether to grant a lock, and it is what turns a routine schema change into a cascade.
TL;DR. A lock request is granted immediately only when it conflicts with neither the currently held lock mask nor the mask of modes currently being waited on. New readers queue behind a waiting writer even when their mode is compatible with the currently held reader, because they conflict with the waiter. The rule enforces writer fairness, not deadlock avoidance, and it lives in roughly fifteen lines of
LockAcquireExtendedinsrc/backend/storage/lmgr/lock.c. It is the single most common source of "my harmlessADD COLUMNtook down the API" incidents.
The rule, stated precisely
Most lock-queue explainers state the rule as "Postgres lock acquisition is FIFO." That framing is roughly correct and completely misses the mechanism. FIFO is the consequence. The cause is a two-mask check.
The actual rule, paraphrased from LockAcquireExtended: a new lock request is granted immediately if and only if it conflicts with neither heldMask (the union of modes currently held on the lock) nor waitMask (the union of modes currently being waited on). If the request conflicts with either mask, it waits. The waitMask check is the fairness rule. It is what makes an ACCESS SHARE request wait even when no currently held lock would have conflicted with it.
The two-mask decision that every lock acquisition passes through. The lower-right branch is the fairness rule.
What actually happens, in pg_locks
Consider a three-session scenario on a users table. Session 1 opens a transaction and issues a long-running SELECT. Session 2 issues an ALTER TABLE. Session 3 issues another SELECT. Standard intuition predicts that session 1 and session 3 run concurrently (two readers, compatible locks), and session 2 waits for both of them. Postgres behaves differently.
-- while all three sessions are active
SELECT pid, mode, granted,
now() - query_start AS waiting_for,
left(query, 40) AS query
FROM pg_locks l
JOIN pg_stat_activity a USING (pid)
WHERE relation = 'public.users'::regclass
ORDER BY granted DESC, waiting_for DESC;
pid | mode | granted | waiting_for | query
------+-----------------------+---------+-------------+------------------------------------------
2840 | AccessShareLock | t | 00:04:12 | SELECT count(*) FROM users WHERE ...
2901 | AccessExclusiveLock | f | 00:00:12 | ALTER TABLE users ADD COLUMN foo text
3102 | AccessShareLock | f | 00:00:04 | SELECT * FROM users WHERE id = 7
3108 | AccessShareLock | f | 00:00:02 | SELECT id FROM users WHERE email = ...
3115 | AccessShareLock | f | 00:00:01 | SELECT count(*) FROM users
Read the table from the top. One reader (pid 2840) has been holding ACCESS SHARE for four minutes. One writer (pid 2901) has been waiting twelve seconds for ACCESS EXCLUSIVE. Three later readers are waiting behind the writer, even though their mode is ACCESS SHARE, which is compatible with the held reader's mode. They arrived after the writer began waiting. That arrival order is the only thing that matters.
The waiting readers are not victims of a deadlock. There is no cycle. Postgres could grant them immediately and nothing would go wrong from a correctness standpoint. It refuses, on purpose. The reason is in the source.
Where the rule lives in the Postgres source
Three files carry the logic that matters here, all under src/backend/storage/lmgr/. The primary file is lock.c. The supporting file for waiters is proc.c. The lock-mode compatibility table is in lock.c and mirrored in the docs as Table 13.2.
LockAcquireExtended and the two-mask check
When a backend needs a heavyweight lock, it calls LockAcquireExtended in lock.c. The signature, near line 1180 in the current master branch:
LockAcquireResult
LockAcquireExtended(const LOCKTAG *locktag,
LOCKMODE lockmode,
bool sessionLock,
bool dontWait,
bool reportMemoryError,
LOCALLOCK **locallockp,
bool logLockFailure)
Partway through the function, after fast-path handling and a few other cases, comes the decision whose outcome the rest of the article is about. The shape of the check, elided for brevity but faithful to the logic:
/* Does my requested mode conflict with anything already waiting? */
if (lockMethodTable->conflictTab[lockmode] & lock->waitMask)
found_conflict = true;
else
found_conflict = LockCheckConflicts(lockMethodTable, lockmode,
lock, proclock);
if (!found_conflict)
{
/* No conflict with held or previously requested locks. Grant. */
GrantLock(lock, proclock, lockmode);
waitResult = PROC_WAIT_STATUS_OK;
}
else
{
/* Enqueue this backend behind existing waiters. */
waitResult = JoinWaitQueue(locallock, lockMethodTable, dontWait);
}
Three pieces of this matter for the rule.
conflictTab[lockmode] is a bit mask, indexed by lock mode, that names every lock mode that conflicts with the mode being requested. For ACCESS SHARE this mask is exactly one bit: ACCESS EXCLUSIVE. Every other mode is compatible.
lock->waitMask is a bit mask that names every mode currently being requested by waiters on this lock. When the waiter queue is empty, it is zero.
The bitwise AND between the two asks: "does my mode conflict with any mode already in the wait queue?" If the answer is yes, found_conflict is set, and the request is sent to JoinWaitQueue rather than granted. This happens before the check against currently held locks ever finds out whether the request would have been compatible with the holder.
JoinWaitQueue and ProcSleep
Once found_conflict is true, the backend calls JoinWaitQueue in proc.c. That function inserts the waiter into the per-lock queue in FIFO order, updates lock->waitMask to include this new mode, and returns PROC_WAIT_STATUS_WAITING. LockAcquireExtended then calls WaitOnLock, which calls ProcSleep, which actually parks the backend until the lock is granted or a timeout fires.
The comment block above ProcSleep, near line 1060 in master:
/*
* ProcSleep -- put process to sleep waiting on lock
*
* This must be called when JoinWaitQueue() returns PROC_WAIT_STATUS_WAITING.
* Returns after the lock has been granted, or if a deadlock is detected.
*/
ProcSleep is the visible symptom. Every "SELECT waiting" row in pg_locks is a backend sitting inside ProcSleep. But the decision to sleep was already made upstream, by the waitMask check in LockAcquireExtended. That is the source of the surprise.
Why the rule exists: writer fairness
The obvious question is why Postgres bothers with the waitMask check at all. Without it, readers would be granted immediately whenever they are compatible with the currently held mode, and writers would wait for the held reader to finish.
The problem with that is unbounded writer delay. A stream of compatible readers can arrive faster than any single reader finishes. The held ACCESS SHARE is released when the first reader commits, but by then three more readers have arrived and joined, each extending the effective "held" window. A writer waiting for ACCESS EXCLUSIVE never gets a gap. It sits at the front of an invisible queue that never empties.
The waitMask check collapses that starvation. The moment any request with a conflicting mode begins waiting, arriving compatible requests also wait. The writer now has a finite upper bound on its wait: the duration of the currently held readers, plus zero, because no new ones can jump ahead. That upper bound is what waitMask buys.
The cost is paid by the readers. They wait for a writer they do not conflict with, on behalf of starvation prevention. In most workloads the trade is correct. In the specific workload where a long-running read holds ACCESS SHARE for minutes and a routine DDL arrives, the cost becomes visible.
Watching it happen: a three-terminal reproduction
The scenario reproduces in under a minute on any Postgres instance. Open three terminals connected to the same database.
Terminal 1, hold a reader open. Do not commit.
BEGIN;
SELECT count(*) FROM users;
-- leave this transaction open
Terminal 2, attempt a DDL that needs ACCESS EXCLUSIVE.
ALTER TABLE users ADD COLUMN foo text;
-- this hangs, waiting for ACCESS EXCLUSIVE
Terminal 3, issue a trivial read that should, by compatibility alone, run alongside terminal 1.
SELECT count(*) FROM users;
-- this also hangs, behind terminal 2
In a fourth terminal, inspect the queue:
SELECT pid, mode, granted, now() - query_start AS waiting_for
FROM pg_locks l
JOIN pg_stat_activity a USING (pid)
WHERE relation = 'public.users'::regclass
ORDER BY granted DESC, waiting_for DESC;
The output will show terminal 1's backend as AccessShareLock granted=t, terminal 2 as AccessExclusiveLock granted=f, and terminal 3 as AccessShareLock granted=f. Three sessions, two of which hold compatible lock modes by the standard compatibility matrix, yet only one of them actually holds.
What this means for DDL
Almost every schema change wants ACCESS EXCLUSIVE. ALTER TABLE, DROP TABLE, TRUNCATE, REINDEX without CONCURRENTLY, VACUUM FULL, CLUSTER, and ADD CONSTRAINT without NOT VALID all take ACCESS EXCLUSIVE for their duration. A short DDL on a busy table has three distinct cost components, and the waitMask rule shapes the second one.
Naive expectation
Actual Postgres behavior
The three cost components of a DDL on a busy table: (1) the time the DDL waits for current holders to release, (2) the time during which every new reader and writer is blocked behind the DDL, and (3) the time the DDL itself takes to execute once it finally acquires the lock. Component one is bounded by the longest currently held lock. Component three is often sub-second for metadata-only DDL in Postgres 11 and later. Component two is the one that scales with traffic and produces the visible impact on the application. It is entirely the waitMask rule's contribution.
This is what happened in Railway's December 8, 2025 incident, documented in their public post-mortem and covered in more detail in an earlier post on this blog. A long-running reader held ACCESS SHARE, a routine ADD COLUMN migration requested ACCESS EXCLUSIVE, and twenty-three minutes of reader and replica traffic queued behind the waiting DDL. The DDL itself, once it finally ran, completed in milliseconds. The outage was component two, end to end.
Living with the rule
The rule is not going to change. It is in every supported Postgres version and every reasonable fork. What can change is how DDL interacts with it. Three complementary strategies.
Set lock_timeout on every DDL. A value of one to three seconds bounds the window during which the DDL participates in the waitMask. If the DDL cannot acquire within the timeout, it aborts, and the wait queue collapses back to whatever existed before it arrived. The DDL then retries, either manually or via a tool like pgroll that handles backoff. Setting this at the session level before the DDL is standard practice at Postgres-aware teams and should be enforced by lint in CI.
Split validating DDL into two steps. ADD CONSTRAINT ... CHECK (...) NOT VALID followed by a separate VALIDATE CONSTRAINT statement changes the lock profile of a CHECK addition from one long ACCESS EXCLUSIVE scan to one instantaneous ACCESS EXCLUSIVE metadata change plus a long SHARE UPDATE EXCLUSIVE validation scan. The second step does not acquire ACCESS EXCLUSIVE and therefore does not add anything to waitMask that conflicts with ACCESS SHARE. Reads continue through the validation. The same split exists for ADD FOREIGN KEY.
Bound transaction length. The waitMask rule bites when the initially held lock is held for a long time. A reader that holds ACCESS SHARE for four minutes creates a four-minute window during which any arriving DDL pulls every subsequent reader into the queue. Setting a short statement_timeout at the application level, and avoiding transactions that stay open across application operations, shrinks that window directly. Long-running analytical queries that must hold locks for minutes should run on a physical or logical replica, not on the primary.
None of the three prevents the waitMask from activating. They prevent it from mattering. lock_timeout shortens component two. The NOT VALID split moves the DDL out of the set of statements that need ACCESS EXCLUSIVE for their duration. A shorter transaction shrinks the window in which the DDL's arrival creates a conflict at all.
What CONCURRENTLY does and does not fix
CREATE INDEX CONCURRENTLY, REINDEX CONCURRENTLY, and DROP INDEX CONCURRENTLY take SHARE UPDATE EXCLUSIVE rather than ACCESS EXCLUSIVE. That mode is compatible with ACCESS SHARE and most other reader modes. A CREATE INDEX CONCURRENTLY that arrives behind a long-running reader does not add a conflicting bit to waitMask for the reader's mode, and the reader can proceed. CONCURRENTLY is the canonical example of a DDL that sidesteps the rule on purpose.
It does not cover ALTER TABLE. There is no ALTER TABLE ... CONCURRENTLY. The tools that implement online ALTER TABLE (pgroll, pg_osc, pg_karnak) do so by rewriting the statement into a series of smaller changes that individually take weaker locks, then switching the application over. They do not change the behavior of LockAcquireExtended. They change what statements the application sends to it.
Closing note
The lock fairness rule is fifteen lines of C in a file most Postgres users never open. It is also the single largest source of surprise in production migration incidents at 15 to 60 engineer SaaS teams, because it breaks the one piece of intuition developers carry from documentation: "ACCESS SHARE only conflicts with ACCESS EXCLUSIVE." That sentence is true of held locks and false of queued ones. The distinction is the article.
A pull-request check that reads pg_locks and pg_stat_activity at merge time is the only way to see the second-order cost of a DDL in the moment it would hit production. lock_timeout, NOT VALID, and shorter transactions each address the symptoms. Looking at the wait queue before the merge addresses the cause. If you want that kind of verdict at PR time instead of a post-mortem, that is what we are building at Datapace.
Frequently asked questions
Why does Postgres not let a new ACCESS SHARE join the current ACCESS SHARE holder when an ACCESS EXCLUSIVE is waiting between them?
Because letting compatible readers keep joining would starve the waiting writer indefinitely. As long as new readers keep arriving faster than existing readers commit, the held ACCESS SHARE is never released completely, and the waiting ACCESS EXCLUSIVE never gets a gap to acquire. The waitMask check collapses that starvation by making every new request conflict with any waiter of a conflicting mode.
Is this behavior specific to DDL, or does it happen with any exclusive lock?
Any exclusive lock. LOCK TABLE ... IN ACCESS EXCLUSIVE MODE, TRUNCATE, VACUUM FULL, CLUSTER, and anything else that asks for ACCESS EXCLUSIVE behaves the same way. DDL is the common case because it is the most common source of ACCESS EXCLUSIVE in application workloads.
Does SET lock_timeout fix this?
It bounds it. The waitMask still activates when the DDL begins waiting. What lock_timeout does is cause the DDL to abort after N seconds of waiting, which removes the conflicting bit from waitMask and lets queued readers proceed. The reads were still blocked for up to N seconds. The cost is real. The cost is bounded.
Is there a way to inspect waitMask directly?
Not as a column in pg_locks. The view exposes per-row granted/requested state rather than the aggregated mask. The mask can be reconstructed by selecting from pg_locks where granted = false on a given relation and taking the union of modes. In practice pg_locks joined to pg_stat_activity is sufficient to see what is happening without reconstructing the bitmask explicitly.
What if my DDL uses CONCURRENTLY?
CREATE INDEX CONCURRENTLY, REINDEX CONCURRENTLY, and DROP INDEX CONCURRENTLY take SHARE UPDATE EXCLUSIVE, which is compatible with ACCESS SHARE. A CONCURRENTLY statement waiting on the lock does not add a reader-conflicting bit to waitMask. Reads proceed. This is the main reason CONCURRENTLY is recommended for index changes on busy tables. There is no ALTER TABLE ... CONCURRENTLY, so for general schema changes the rule still applies.
Sources
- PostgreSQL source code,
src/backend/storage/lmgr/lock.c(LockAcquireExtended,LockCheckConflicts, compatibility table). - PostgreSQL source code,
src/backend/storage/lmgr/proc.c(JoinWaitQueue,ProcSleep,WaitOnLock). - PostgreSQL documentation, Chapter 13.3, Explicit Locking (current version).
- PostgreSQL documentation, pg_locks system view.
- Andrew Farries, "Schema changes and the Postgres lock queue", Xata, June 18, 2024.
- Railway, Incident report, December 8, 2025.
- Datapace blog, "One CI check would have caught both of Railway's billion-row Postgres migration outages".
Want to optimize your database performance?
Get AI-powered recommendations for your specific database setup.