PR-time simulation vs post-deploy lock_timeout: both, not either
lock_timeout is correct and necessary. It is not enough. A PR-time lock-graph simulator catches the same class of migration before the merge, without producing a failed-deploy signal.
Postgres post-mortems, patterns, and pre-merge checks, for backend teams without a dedicated DBA.
lock_timeout is correct and necessary. It is not enough. A PR-time lock-graph simulator catches the same class of migration before the merge, without producing a failed-deploy signal.
Between 2015 and 2025, the DBA role dissolved at the mid-size SaaS tier. Three of four responsibility buckets found new owners. The fourth, PR-time schema review, did not land cleanly anywhere.
Eraser (VLDB 2023) named it. RIB and Hybrid Cost Modeling extended it. The robust-fix problem is one question asked at different layers of the DB stack. Datapace asks it at the PR surface.
The two Postgres diagnostic extensions are complementary, not competitors. A technical reference for when each lies, how to join them via queryid, and the minimum production setup.
Yahoo S5, NAB, and UCR shaped the TSAD literature. Postgres metrics do not look like those benchmarks, and methods that score well on them produce unusable alerting pipelines in production.
The 2017 self-driving DBMS vision promised autonomous tuning, indexing, and healing. Nine years on, the research shipped components, production stayed advisory, the human stayed in the loop.
Four online DDL tools, four different architectures: versioned views, shadow tables, distributed 2PC, and binlog streaming. A neutral architectural comparison with the failure modes named.
D-Bot, λ-Tune, AgentTune, GaussMaster, and ROMAS all ship as dashboard copilots. For regressions caused by a merged commit, that is the wrong surface. The case for a repo-native LLM DBA.
Frontier LLMs hit 10 to 24 percent on realistic enterprise SQL benchmarks. Coding agents write 4 percent of public GitHub commits. The gap between the two is the new systemic risk.
DBSherlock, iSQUAD, D-Bot, and RCRank each infer a DB-internal root cause. None of them names the commit. The gap between the fourth depth of inference and the developer workflow.
A SELECT arriving mid-ALTER TABLE waits, even though SHARE-SHARE is compatible. Postgres prioritizes queue fairness over lock-mode compatibility. A close read of the source that enforces it.
An April 2026 post-mortem tested a Postgres CHECK constraint on 10,000 rows, then hit 84 million in production. Four hours locked. The failure mode is sampling bias.
Railway had two Postgres migration outages in six weeks, October 28 and December 8, 2025. Same cascade, different trigger. A close reading of both post-mortems and the single pre-merge check.
EXPLAIN ANALYZE plans one query at a time. N+1 storms are a hundred fast queries that add up to a slow page. Each looks fine on its own. The signal lives in pg_stat_statements, not the planner.
Postgres has three concentric levels of EXPLAIN: the plan, the actual execution, and the real I/O cost. Most developers learn the first and miss the other two. Here is all three, on real output.
pg_stat_statements is the only unbiased view of which queries Postgres actually spends time on. Two queries cover most use. Here is setup, the two queries, and how to read the output.
A sequential scan on a hot table is both a performance problem and a cost problem. The database does not distinguish. The fix for both is the same index, and catching it at PR time is straightforward.
Most Postgres performance problems come from a short list of avoidable SQL patterns. Five that show up most often: unindexed filters, SELECT *, deep OFFSET, N+1, and column casts in WHERE.
Supabase ships solid default monitoring. Past a point, three gaps appear: no historical context, reactive top-queries view, no correlation to user impact. Here is what fills each, and what does not.
Get AI-powered recommendations for your specific database setup.