What is the difference between pgroll, pg-osc, and pg_karnak?

They solve different shapes of the same problem rather than competing on features. pgroll serves schema changes through versioned views and an expand-and-contract flow, fitting single-database teams whose deploy model supports two active app versions. pg-osc builds a shadow table and replays writes through an audit trigger, fitting one-shot table rewrites on a primary you control. pg_karnak hooks Postgres via processUtility and runs distributed DDL through a two-phase-commit coordinator, fitting multi-tenant control planes where the same schema must land atomically across many databases.

Why is gh-ost listed if it does not work on Postgres?

gh-ost is MySQL-only and is included as the architectural contrast, not as a Postgres option. Its design is explicitly trigger-free: it consumes MySQL's binary log stream into a ghost table and applies changes asynchronously, and it postpones cutover until a human operator approves. The transferable insight is that a Postgres equivalent could in principle read the logical-decoding stream, but no production-grade Postgres tool has taken that path.

Do these online DDL tools work on managed Postgres like RDS, Cloud SQL, or Neon?

It depends on the tool's privilege needs. pgroll works on any Postgres 14 or newer, including RDS and Aurora. pg-osc creates triggers, so it needs trigger-creation privileges rather than strictly superuser; RDS typically grants enough for it to work, while Cloud SQL restricts event triggers, so verify support on your specific instance. pg_karnak is a Postgres extension, so it requires a managed environment that permits installing it. gh-ost remains MySQL-only and is not applicable to any of these.

Do online DDL tools prevent a dangerous migration from running?

No. All four execute a schema change online, but none decides whether the change should run given current production state. A correct-by-construction executor will still dutifully copy a billion rows or run for hours if a developer authored a migration that requires it. That pre-merge decision, whether this migration should run at all right now, belongs to a separate layer that sits in front of the runner. Datapace targets that gap across engines as a context layer and security gateway.

Online DDL: pgroll vs pg-osc vs pg_karnak vs gh-ost

Four tools, four architectures, one problem. pgroll serves DDL through versioned views. pg-osc builds a shadow table and replays writes via an audit trigger. pg_karnak extends Postgres with a processUtility_hook and runs distributed DDL through a 2PC coordinator. gh-ost consumes MySQL's binary log stream into a ghost table with no triggers at all. They all claim to do online DDL. They operate at radically different layers, and the right choice depends on which layer of your stack has the problem.

TL;DR. pgroll fits single-database teams that want semantic versioning of the schema. pg-osc fits teams doing one big table rewrite on a well-understood primary. pg_karnak fits multi-tenant control planes where the same schema has to land atomically across many databases. gh-ost is MySQL-only and is listed as the architectural contrast, not a Postgres option. None of the four prevents a human from writing a migration that takes down production. That is a different problem.

Four tools, at a glance

$Four-column comparison card. pgroll (Xata, Postgres): versioned views over physical tables, expand/contract pattern, sync triggers between old and new columns, brief ACCESS EXCLUSIVE per phase, single DB with two schema versions visible. pg-osc (Shayon Mukherjee, Postgres): shadow table with audit triggers, copy/cut-over pattern, writes replayed from audit, ACCESS EXCLUSIVE twice (setup and swap), single DB one table at a time. pg\_karnak (Nile, Postgres): processUtility hook plus 2PC coordinator, distributed DDL pattern, native Postgres 2PC semantics, consistent lock order with short lock\_timeout, multi-tenant many DBs at once. gh-ost (GitHub, MySQL only): binlog stream into ghost table, triggerless copy pattern, writes replayed from binary log, postponable cutover, MySQL/MariaDB only not Postgres.$

The architectures are not variations of one idea. They are four distinct answers to "what does online DDL even mean."

pgroll (Xata)

pgroll's core claim, lifted directly from the project README, is that it "works by creating virtual schemas by using views on top of the physical tables." Existing clients read and write through views pinned to the version of the schema they were written against. A migration creates a new view layer that exposes the new shape of the schema without rewriting the physical table. Breaking changes are handled through an expand-and-contract flow: during the active migration, pgroll creates new columns alongside old ones and wires triggers between them so writes stay synchronized across both schema versions. When every client has migrated to the new version, the contract phase drops the old view, the old column, and the triggers.

The trade-off is that the schema lives in two visible versions simultaneously. Two application versions can run concurrently against the same database, each reading through its own view. The complexity is worth it when the team's deploy model already supports two active app versions (staged rollouts, canaries, or gradual migrations with feature flags). It is overhead when the app always redeploys in one shot.

pgroll's locks are brief. The physical table is not rewritten for additive changes. For column type changes or renames it still has to copy data, and the copy runs in the background with triggers replicating writes. Index builds are CONCURRENTLY where possible. What pgroll does not do is protect against a user-authored migration that, under expand-and-contract, does something the lock graph cannot absorb. The tool is a correct-by-construction executor. It is not a planner.

pg-osc (Shayon Mukherjee)

pg-osc takes the percona-toolkit approach, translated to Postgres. Its README is explicit: it "creates a shadow table that looks structurally the same as the primary table, performs the schema change on the shadow table, copies contents from the primary table to the shadow table and swaps the table names in the end while preserving all changes to the primary table using triggers (via audit table)."

Five mechanical steps. Create the audit table. Install triggers on the primary to capture inserts, updates, and deletes. Create the shadow table with the new schema. Copy data from primary to shadow, rebuilding indexes. Replay accumulated audit-table changes onto the shadow. Swap the table names and update foreign-key references. The tool acquires ACCESS EXCLUSIVE on the primary twice: once at trigger-and-shadow setup and once at cutover. The README notes the locks are held briefly and the tool optionally kills competing backends to force acquisition under contention.

pg-osc is the right tool for a team doing one big table rewrite on a primary they control and understand. It is not the right tool for a schema-change stream. The cost model is per-table and does not compose cleanly across chained migrations. And "optionally kills competing backends" is a real operational statement, not a footnote: when the tool cannot get its ACCESS EXCLUSIVE quickly, it terminates whoever is holding the conflicting lock. That is fine in planned maintenance windows. It is a different risk profile during normal operations.

pg_karnak (Nile)

pg_karnak is the newest entry and solves a different problem than pgroll or pg-osc. At Nile, each customer tenant lives in its own Postgres database, which means a schema change against "the app" means the same DDL against many databases. pg_karnak is built to make that atomic.

Two pieces, from the Nile engineering post: an extension that hooks into Postgres via the processUtility_hook, which fires when DDL executes, and a standalone transaction coordinator that runs a two-phase commit across the tenant databases. When a user issues CREATE TABLE, the extension intercepts, the coordinator initializes a distributed transaction, each tenant acquires its locks, each prepares the DDL, and the coordinator commits or rolls back all tenants together.

The lock-graph problem that shows up in single-database systems shows up differently here. A 2PC across many databases multiplies the ways a single long-running query on one tenant can hold up the whole transaction. pg_karnak handles this by acquiring locks in a consistent order across all tenants (always starting with the same tenant) and by setting a short LOCK_TIMEOUT on each distributed DDL. If any tenant times out, the whole transaction aborts cleanly rather than holding the queue open.

The tool's niche is specific: multi-tenant control planes where the schema has to stay uniform across tenant instances. For a single-database SaaS, pg_karnak is overkill. For any platform that gives each customer its own database and wants schema changes to deploy atomically, it is one of the few systems that does this correctly.

gh-ost (GitHub)

gh-ost is listed here because it is the architectural contrast, not because it is a Postgres option. It is MySQL-only. The project's design philosophy is explicitly anti-trigger: it "uses the binary log stream to capture table changes, and asynchronously applies them onto the ghost table." No triggers on the source. Writes are reconstructed from MySQL's replication stream and applied to the shadow asynchronously.

The architectural insight is transferable to Postgres: in principle, a similar system could consume Postgres's logical-decoding stream into a shadow table. In practice, none of the production-grade Postgres tools have taken this path. Logical decoding is more complex than MySQL's binlog, the replication slots require careful lifecycle management, and the ecosystem has settled on trigger-based copy (pg-osc) or view-based versioning (pgroll).

The cut-over step is where gh-ost invests the most. The cutover can be postponed until a human operator gives the go-ahead, which is a different design philosophy than pg-osc's "kill competing backends to acquire the lock" fallback. gh-ost assumes a human in the loop at the moment of cutover. pg-osc assumes automation.

Where each tool breaks

Every tool has a failure mode that the others do not share. A team choosing one should know the failure mode and plan for it.

pgroll

Failure mode: app must run on two schema versions
Constraint: Postgres 14+
Query model: all access goes through views
Performance cost: view indirection on every query

pg-osc

Failure mode: kills competing backends on contention
Constraint: one table at a time
Query model: primary keeps serving through copy
Performance cost: audit trigger on every write during copy

pg_karnak

Failure mode: 2PC abort cascades across all tenants
Constraint: multi-tenant topology required
Query model: tenant-local reads, coordinator-gated DDL
Performance cost: N-way round trip per DDL

gh-ost (for reference)

Failure mode: replication-lag spikes during copy
Constraint: MySQL only, binlog access
Query model: primary serves throughout
Performance cost: human-gated cutover

Which one should you use

The right answer depends on the shape of the problem, not on a feature checkboard. Four rules of thumb.

Single Postgres database, steady migration stream, ok with two schema versions at once. Use pgroll. The view-based versioning matches the reality of rolling deploys and gives you safe rollback for free. The cost is the view indirection, which is measurable but rarely the bottleneck in OLTP workloads.

Single Postgres database, occasional one-shot table rewrites, planned maintenance window. Use pg-osc. The tool is designed for exactly this: a big change on one primary, one table at a time, with acceptable downtime tolerance in the cutover. If you cannot tolerate the "kill competing backends" fallback during cutover, schedule it outside peak traffic.

Multi-tenant control plane, one schema per tenant, must stay uniform. Use pg_karnak. Nothing else in the Postgres ecosystem handles this specific case cleanly. The alternative is reimplementing 2PC-for-DDL yourself, which is a multi-quarter project with a long tail of correctness bugs.

MySQL, not Postgres. Use gh-ost. Not a Postgres decision.

What none of them cover

All four tools handle the execution of an online DDL. None of them handles the decision of whether the DDL should merge in the first place. That is a different problem, upstream of any of the four, and it is where the Railway post-mortems and the April 10, 2026 staging-gap incident live. A tool that rewrites ALTER TABLE into a safe execution plan does not help when the developer merged a migration that the execution plan, once rewritten, still takes four hours to run. The decision to merge is a judgment call about production state, and it happens in the PR before any of these tools are invoked.

Concretely: pgroll is a correct-by-construction executor. It cannot prevent a user from writing a migration that requires copying a billion rows, which pgroll will dutifully run as an online copy over hours while adding load to a live system. pg-osc faces the same question. pg_karnak is the best positioned among the four to warn early, because it has to coordinate across tenants and is already thinking in terms of timeouts and lock budgets, but its job ends when the coordinator commits. The PR-time question "should this migration run at all, given what production looks like right now" is not what any of them is designed to answer.

This is not a criticism of the tools. They solve the execution problem. The decision problem is a separate pre-merge schema safety layer, and it belongs in the PR, not in the migration runner.

Closing note

Picking the right online DDL tool is a scoping question, not a ranking question. pgroll is not better than pg-osc; they solve different shapes of the same problem. pg_karnak is not competing with either of them; it targets a topology they do not serve. The honest comparison is architectural, not benchmark-based, because the architectures produce different operational profiles that a benchmark cannot flatten. What is missing from all four is a pre-merge check that sits in front of whichever runner the team uses, reads production state, and blocks the merge when the cost is unbounded. Datapace is the context layer and security gateway that fills that gap: a PR-time verdict that runs before pgroll, pg-osc, or pg_karnak gets invoked, with DBAs in control of what merges. If you want to see that pre-merge verdict against your own schema, Book a call.

Sources

Xata, pgroll GitHub repository and documentation.
Shayon Mukherjee, pg-osc GitHub repository.
Nile Engineering, "Introducing pg_karnak: Transactional schema migration across tenant databases", November 2024.
GitHub, gh-ost GitHub repository.
Andrew Farries, "Schema changes and the Postgres lock queue", Xata, June 18, 2024.
PostgreSQL documentation, ALTER TABLE (current version).
Datapace blog, "ACCESS SHARE does not jump the queue: Postgres lock fairness".
Datapace blog, "The 8,400x staging gap: why staging lies about migration safety".

Online DDL: pgroll vs pg-osc vs pg_karnak vs gh-ost

Four tools, at a glance

pgroll (Xata)

pg-osc (Shayon Mukherjee)

pg_karnak (Nile)

gh-ost (GitHub)

Where each tool breaks

pgroll

pg-osc

pg_karnak

gh-ost (for reference)

Which one should you use

What none of them cover

Closing note

Sources

Frequently asked questions

Related use case

Keep reading

Railway's two migration outages, one CI check

Why does ALTER TABLE block SELECT in Postgres?

The 8,400x staging gap in migration safety

Ready to let agents touch production, safely?