New: how one CI check would have caught both of Railway's billion-row Postgres migration outages. Read the post →
Essay
April 19, 2026
11 min read

The DBA extinction: what disappeared and what did not replace it

Between 2015 and 2025, the DBA role dissolved at the mid-size SaaS tier. Three of four responsibility buckets found new owners. The fourth, PR-time schema review, did not land cleanly anywhere.

#PostgreSQL#Database reliability#Engineering orgs#DevOps#SRE

Ten years ago, a 30-engineer SaaS company had a DBA, or at minimum a designated senior backend engineer who owned database decisions. In 2026, most companies in that size range do not. The DBA role did not leave one piece at a time; it dissolved, with its responsibilities absorbed by different parts of the engineering organization. Four buckets of responsibility, four different absorbing roles, one bucket that nothing quite caught. That one uncaught bucket is the structural gap this essay is about.

TL;DR. Capacity and uptime moved to the managed-database provider. Backups moved to the same provider. Query performance split between the backend engineer who wrote the code and the SRE or platform team who owns instance-level health. PR-time judgment about a schema change, the responsibility that catches production cascades before they ship, did not land cleanly on any of these roles. It is the gap that remains.

The four buckets, and where each one went

Two-column diagram. Left column labeled DBA 2015 with four responsibility buckets: capacity and uptime (failover, replication, I/O); backup and recovery (dumps, PITR, DR drills); schema design (normalization, indexes, DDL review); query performance (plan review, index recs, tuning). Arrows lead to right column labeled 2025 absorbed by four different roles. Managed provider catches capacity and uptime (RDS, Cloud SQL, Neon). Managed provider catches backup and recovery (automated snapshots, PITR). Backend engineer takes schema design with PR-time judgment highlighted in red as the gap. Backend plus SRE share query performance (observability, tuning). Footer: the red bucket is the one nothing caught.

Three of the four buckets found new owners cleanly. The fourth did not.

Capacity and uptime: to the managed provider

AWS RDS, Google Cloud SQL, Neon, Supabase, Crunchy, and their competitors ship failover, replication topology management, and I/O capacity as a service. The DBA used to own the pager for these. The managed service now does. A 30-engineer SaaS in 2026 does not run its own replica failover script; it pays the provider to do it. The DBA work did not disappear; it was outsourced to a vendor with many customers and thus much cheaper per-team cost.

This is the cleanest of the absorptions. Capacity and uptime is a bounded problem with well-understood mechanisms, and the managed providers have had ten years to productize it. The remaining edge cases at the capacity layer (unusual failure patterns, regional failover drills) are rare enough that the in-house responsibility can sit with SRE or platform without dedicating a DBA.

Backup and recovery: to the managed provider

The same providers ship continuous backup, point-in-time recovery, cross-region replicas, and snapshot-based restore. The DBA used to own the DR drill and the nightly pg_dump; the provider now ships automated snapshots with retention policies the customer configures via console. The in-house responsibility shrinks to "pick a retention and a recovery-point objective," which is a quarterly meeting rather than a weekly operational commitment.

Query performance: to backend + SRE

Query performance tuning, the responsibility that traditionally required the most craft, split. The backend engineer who shipped the endpoint owns the performance of their specific queries. The SRE or platform team owns instance-level health: aggregate latency, connection-pool saturation, buffer-cache efficiency. The former reads pg_stat_statements for their own queries. The latter reads the same view for instance-wide patterns. Both lean heavily on managed observability (pganalyze, Datadog DBM, the provider's own dashboards) to do work the 2015 DBA would have done by hand.

This split is less clean than the first two buckets. Queries cross team boundaries (one team's index helps or hurts another team's query), connection-pool behavior is sensitive to choices made across the whole application, and the physical layer affects every query simultaneously. But the split is workable because the observability layer exposes enough signal for both sides to see what they need. The DORA reports through 2024 describe organizations of this shape as common at the mid-size SaaS tier.

Schema design: to the backend engineer, with a gap

The fourth bucket is schema design, and this is where the absorption broke down. The backend engineer who writes the feature writes the migration. The migration is a .sql file in their PR. In 2015, before merging, a DBA would look at it and tell them whether the ALTER TABLE would take ACCESS EXCLUSIVE for an acceptable duration, whether the new index should be CONCURRENT, whether the column default was going to trigger a table rewrite, whether the constraint addition should use NOT VALID first.

In 2026, nobody looks at the migration in that way. The backend engineer may lint it, the CI pipeline may check it syntactically, and a teammate may review the diff for correctness. None of those reviews looks at the production lock graph or the size of the target table. That information is not in the PR, and none of the people in the review loop have a habit of fetching it.

Why the gap is structural

Three reasons the schema-review gap is not about to close on its own.

The information is not in the repo. A review of the migration requires knowledge of production state: how big is the table, what queries are currently holding locks, what the deploy window looks like, what concurrent writers will be affected. None of this is visible in the PR. A backend engineer reviewing their teammate's migration does not have this context and cannot reasonably be expected to acquire it for every change. The 2015 DBA acquired it by being permanently assigned to the database; the 2026 backend engineer is assigned to the feature, and the feature does not pay them to know production state.

The managed provider does not review migrations. The provider's job ends at the infrastructure layer. They will not tell you that your CREATE INDEX is missing CONCURRENTLY, and they will not block the deploy. That is outside the contract. Migration review is a workflow responsibility, not an infrastructure one.

The observability layer detects after the fact. pganalyze, Datadog DBM, and similar tools surface the cascade after it starts. Alerting on a deploy-boundary regression is the wrong loop for "this migration should not have shipped" because by the time the alert fires, the migration has shipped and the damage is done. The observability loop catches the symptom; the gap is about catching the cause before it arrives.

Each of the three makes the gap reproducible. A 30-engineer SaaS that wants to close it on its own has to build tooling that reads production state, parses migration files, and surfaces a verdict in the PR. That is real work, and it is not why the team raised a seed round. So the gap sits.

What this has cost, observably

The Railway post-mortems from October 28 and December 8, 2025, covered in detail in an earlier post on this blog, are two cases where the gap bit a team publicly. The April 10, 2026 post-mortem behind the 8,400x staging-gap article is another. These are the incidents that got written up. The majority of incidents of the same shape are not written up, because they were short enough to fall under the SLA threshold, because nobody outside engineering noticed, or because the team decided the incident was not worth a public post-mortem.

The base rate for a production-unsafe migration at a SaaS running continuous migrations is not zero, and the DORA 2024 report's team archetypes that cluster on high deploy frequency also tend to cluster on high incident rates when the operational support layer has not kept up. Self-reported data, but directional. The point is that the cost of the gap is measurable even without a clean DBA-extinction-caused-outages study, because the class of incidents that the gap enables is itself visible.

What the DBA reviewed in 2015

Signalproduction lock graph, table sizes
Timebefore merge, in the review
Outputblock or pass, with rewrite suggestion
Costone salary, one team

What nobody reviews in 2026

Signalnot read at PR time
Timeafter merge, during incident
Outputpost-mortem
Costper-incident, paid in incidents

The market that follows from the gap

The gap is what Datapace was founded on. The honest framing is not "AI for databases." It is "the DBA job that used to exist, refactored into a PR-time check that reads production state and returns a verdict." A backend engineer does not want a DBA-shaped colleague; they want the specific judgment the DBA would have provided on this specific migration, delivered where they already work, without a context switch.

Three design consequences follow from framing the product this way.

The output is a PR comment, not a dashboard. The surface a DBA would have used (a chat message, a comment on a pull request, a verbal sign-off in a standup) is the surface the tool has to use. The dashboard was a DBA's surface because the DBA spent their day in a dashboard. The backend engineer does not. Covered in more detail in the earlier post on repo agents versus dashboard copilots.

The signal is production state, not workload training. A DBA reviewing a migration in 2015 was not running a benchmark. They were reading pg_stat_activity, estimating a lock window, and knowing the table sizes from memory. The tool has to do the same: read production state live at PR time, not train a model on historical workload.

The failure mode is probabilistic. A DBA sometimes waved through a migration that then caused an incident. The tool will too. The acceptable bar is "better than no review at all," not "perfect." This matches the posture of every robust-fix research system covered in the earlier post: the gate rejects regressions with a confidence, not a certainty.

Closing note

The DBA role dissolved because the economics of engineering at a 30-engineer SaaS do not support it, and because the managed-database providers and the observability tooling absorbed most of what the DBA used to do. The one responsibility that did not absorb cleanly is the one that looks at a migration before it merges and says "this one will take down the API." That responsibility is a workflow decision, not an infrastructure decision, and it lives in the PR. In 2015 a person did it. In 2026 a repo-native agent can do it, and is what we are building at Datapace. The rest of the DBA's old work found its home. This piece needed a different solution than outsourcing to a vendor or absorbing into an SRE.

Frequently asked questions

Is the DBA role really gone at a 30-engineer SaaS?

Not universally. Companies with heavy analytics workloads, strict compliance requirements, or historical investment in Oracle/SQL Server often retain a DBA. The shift is specific to the mid-size Postgres-or-MySQL-on-managed-cloud SaaS profile. At that profile, the DBA is rare. DORA reports and job-posting ecosystems both reflect the pattern even without a single definitive number.

What about the platform-engineering trend the 2024 DORA report covers?

Platform engineering has absorbed a fraction of the DBA's capacity-and-uptime work and some of the observability-tooling operation. It has not absorbed migration review, because platform engineers do not sit in every backend team's PR review surface. The platform team provides the database; the product team provides the migration. The review-of-the-migration role does not fit either team's mandate.

Why not just hire a DBA?

Some companies do. A 30-engineer team can employ a DBA if they are willing to pay for capacity that is, on the median day, underused. The trend of not doing this is driven by cost and by the fact that most days the DBA is not needed; the cost shows up only on the days when the absence produces an incident. The decision math tilts toward not hiring until an incident is painful enough, and by then the outcome has already happened.

Is this argument specific to Postgres?

The shape generalizes. MySQL, SQL Server, and MongoDB teams face the same dissolution with slightly different absorbing roles. Postgres is the focus here because the managed-Postgres ecosystem (RDS, Aurora, Neon, Supabase, Crunchy) has been the fastest to productize capacity and backups, which pulled the schema-review gap into relief sooner.

Does a repo-native agent replace the DBA fully?

No, and it should not claim to. A DBA did capacity forecasting, long-range schema evolution, and organizational-risk conversations that a PR-time agent does not touch. The agent targets the specific decision where the DBA's absence most measurably hurts: the pre-merge judgment on a schema change. Everything else the DBA used to do either found a home or remains a gap a tool does not close.

Sources

  1. B. Beyer, C. Jones, J. Petoff, N. R. Murphy, Site Reliability Engineering, Google, 2016.
  2. D. N. Blank-Edelman, Seeking SRE, O'Reilly, 2018.
  3. G. Kim, J. Humble, P. Debois, J. Willis, The DevOps Handbook (2nd edition), IT Revolution, 2021.
  4. DORA, Accelerate State of DevOps Report 2024.
  5. DORA, 2025 State of AI-Assisted Software Development Report.
  6. Datapace blog, "One CI check would have caught both of Railway's billion-row Postgres migration outages".
  7. Datapace blog, "The 8,400x staging gap: why staging lies about migration safety".
  8. Datapace blog, "Repo agent vs dashboard copilot: the LLM DBA belongs in the PR".

Want to optimize your database performance?

Get AI-powered recommendations for your specific database setup.