What Supabase's built-in monitoring does not tell you
Supabase ships solid default monitoring. Past a point, three gaps appear: no historical context, reactive top-queries view, no correlation to user impact. Here is what fills each, and what does not.
Supabase ships solid default database monitoring for the first year of a project. Past a certain scale, three gaps become visible: there is no way to look at yesterday, the "slow queries" view is reactive, and the dashboard does not tell you which slow query hurt the user. This post describes the three gaps, what fills each one honestly, and where a single tool cannot.
What Supabase gives you by default
The built-in dashboard covers the basics. Live CPU, memory, disk, and connection counts. A query-performance view powered by pg_stat_statements. An advisor that runs periodic checks against the schema. Logs retained for the current plan's window. This is the right starting set for a project that has not hit scale.
The friction appears when the team grows and the database becomes load-bearing. Three specific things the default dashboard cannot do well.
Gap 1: no historical context
The Supabase query-performance view shows current pg_stat_statements rows. It does not retain snapshots. If a query slowed down last Tuesday after a deploy, there is no way to see that from the dashboard today, because yesterday's counter values are gone. pg_stat_statements itself is a running tally; without periodic external snapshots, the past is invisible.
The operational consequence: regressions are only visible during the window in which they are actively happening. A query that degraded a week ago and is now stable at the worse performance level reads as "fine" on the dashboard. The new baseline is the only baseline the tool can see.
What fills this gap
Periodic snapshots of pg_stat_statements stored somewhere with a longer retention than the live extension. Three practical options, from least to most involved:
- A cron job that inserts
pg_stat_statementsinto a timestamped table inside the same database. Queryable with SQL, free, no new infrastructure. - An agent that reads the extension on an interval and ships the rows to a metrics-store or warehouse (Prometheus, Grafana Mimir, ClickHouse, BigQuery). Keeps the production database clean.
- An APM vendor with a Postgres integration. Buy-not-build, costs more, ties you to the vendor's query language.
The Datapace agent does the second option: reads pg_stat_statements into a local SQLite file with configurable retention, optionally ships to Prometheus or a SaaS shipper. Honest framing of the current retention window: it is the local SQLite default, bounded by disk, not "months" as a shipped capability.
Gap 2: the top-queries view is reactive
The dashboard shows you which queries are slow right now. It does not show you that the query was fine before this morning's deploy. Reading the current state of pg_stat_statements and acting on it is reactive by construction: you find out after users are already experiencing the slower query.
The shift that actually reduces on-call pages is moving detection from dashboard-time to deploy-time or, better, PR-time. A check that runs before a migration lands can flag common Postgres performance footguns from the DDL alone: missing index on a filter column, CREATE INDEX without CONCURRENTLY, ACCESS EXCLUSIVE DDL on a table with long-running readers. A check that compares pg_stat_statements call counts across a deploy boundary can flag call-count regressions that an N+1 refactor introduced, within minutes of the deploy, not hours.
What fills this gap
A CI check that reads the proposed migration and the live schema, compares, and blocks the merge on specific patterns. This is the category Datapace fits into. The specific things it can check at PR time today: missing-index-on-filter (deterministic), non-CONCURRENTLY index builds on large tables (deterministic), lock-conflict risk between a migration and in-flight readers (probabilistic, based on observed lock state). Call-count regression detection runs after the deploy, not at PR time, and is probabilistic because attribution to a specific commit depends on timing.
Important honest limit: a pre-merge check cannot detect every performance regression. N+1 cascades from application loops are invisible to it, because the loop is in the app, not in the SQL. For those, the post-deploy pg_stat_statements diff is the only catch.
Gap 3: no correlation between database and user experience
The database dashboard tells you which queries are slow. It does not tell you which user-facing endpoint is degraded because of those slow queries, or how much money a slow query is costing per hour, or whether the slow query is on a path that matters.
A query that doubles in latency from 10 ms to 20 ms is uninteresting if it runs once a day on an internal admin page. The same change on the login path is a production incident. The database has no way to know which is which, because the database has no view of the call graph. The call graph lives in the application, specifically in its distributed tracing.
What fills this gap
Distributed tracing with database spans. The application emits spans for user-facing requests, each span records the database queries it issued, a trace aggregator (Datadog, Honeycomb, Tempo, Jaeger) joins them into an end-to-end view. This is the only mechanism that can answer "this query is slow, is the login path slow?" Any tool that tries to answer that question without looking at trace data is guessing.
Datapace does not fill this gap. It is a database reliability agent, not an APM. For the login-path question, the right answer is a tracing system, not a database tool.
Three gaps, three tools
Supabase built-in
Complement with
The right tool for each gap may not be one tool
The temptation when reading a piece like this is to look for a single replacement for the built-in dashboard. That is the wrong shape. Retention, proactive detection, and business-impact correlation are three different problems that decompose onto three different parts of the stack. A warehouse handles historical retention well and is terrible at proactive detection. A CI check handles proactive detection well and has no view of retention. A tracing system handles user-impact correlation well and knows nothing about the database's internal metrics.
What works in practice is picking the right tool per gap and letting each do its job. Datapace sits in the middle gap, the proactive one, and is explicit about it. The argument for Datapace is not "stop using the Supabase dashboard," it is "add PR-time checks to the loop so the worst class of regressions never lands in production to begin with."
A note on when to graduate
The signal that the default is no longer enough is not a specific engineer count or a revenue threshold. It is a specific feeling, which most teams recognize when they hit it: the dashboard feels reactive, an incident this week had a root cause the dashboard was never going to surface before the page fired, and the post-mortem reads "we should have caught this at PR time." That is the day to look at complements. Before that day, the built-in is fine.
Frequently asked questions
Can I just keep pg_stat_statements counters running without resetting them?
Yes, but then you lose the ability to measure change. Running counters show cumulative data since the last reset. To see a deploy's impact, the usual pattern is: reset before the deploy, let traffic run for an hour, snapshot, compare. Without the reset-and-snapshot cadence, you have a lifetime total that hides recent regressions.
Does Supabase's paid tier fix the retention gap?
Partially. Higher tiers retain logs for longer, which helps with after-the-fact forensics, but the query-performance view remains a live read of pg_stat_statements. Historical query-level retention is not a paid-tier feature; it requires external snapshotting regardless of plan.
Is distributed tracing really necessary, or can I skip it?
You can skip it until a slow query wastes an incident call because nobody can tell which endpoint it affected. For most teams that happens once, the incident is painful, and tracing becomes a priority. Until that day, a correlation lookup between pg_stat_statements and application logs (timestamps in both) is the cheap substitute.
Can Datapace also do the tracing piece?
No. Datapace is a database reliability agent. It reads Postgres statistics and schema, runs checks, surfaces verdicts. It does not emit application spans and does not know what any particular request was trying to do. Tracing is a different tool category.
Sources
- Supabase documentation, Monitoring and troubleshooting
- Supabase documentation, pg_stat_statements
- PostgreSQL documentation, Monitoring statistics
Want to optimize your database performance?
Get AI-powered recommendations for your specific database setup.