Inside Upright’s Rails Engine Architecture for Self-Hosted Synthetic Monitoring
Synthetic monitoring is “easy” until you need evidence.
The check says “down.” Your graphs look fine. The last deploy is a shrug. Your team is staring at a red dashboard that can’t tell you whether this is a regional blip, a DNS issue, a dependency outage, or a real regression.
37signals announced Upright on February 16, 2026 in their launch post, Introducing Upright. The pitch isn’t “another cloud monitoring product.” It’s a self-hosted synthetic monitoring app that’s designed to plug into Prometheus and Alertmanager style workflows, while collecting deeper incident evidence when things get weird.
Upright is also a Rails product in a very particular way: it’s packaged as a Rails engine. That choice explains a lot of the “why” behind the system’s boundaries and defaults.
If you want the code, start at the repo: basecamp/upright. If you want the concept of “Rails engines” as product packaging, Rails has a solid overview in the Rails Engines Guide.
What 37signals is optimizing for
Upright is engineered around a few constraints that are obvious if you’ve been on-call:
- Self-hosted control. You own the stack, the data, and the failure modes.
- Distributed checks. Multiple “sites” (regions/nodes) run checks so you can tell local failures from global failures.
- Evidence for root cause. Not just pass/fail. Logs, artifacts, and context that help you answer “what broke?” faster.
- Fits existing alerting stacks. Prometheus, Alertmanager, and the workflows teams already have.
- Operational defaults. Fewer “choose your own adventure” decisions before you can trust the signal.
That last bullet is the big one. Most monitoring tools sell flexibility. Upright leans into “working baseline” first, then extensibility.
Upright as a Rails engine (why that packaging matters)
In practice, a Rails engine is a way to ship a subsystem with boundaries: a scoped namespace, routes/controllers/models/jobs, migrations, and an installer that sets sane defaults in the host app.
If Upright were “a gem with a few classes,” every team would wire it differently. Different auth. Different job adapters. Different URL layouts. Different metrics and retention behavior. Different deployment topologies. And every one of those differences becomes a unique incident later.
An engine lets Upright act like a drop-in operational product inside Rails. You mount it, you configure it, and you get a consistent surface area. Not identical for every team, but consistent enough that you can build a reliable mental model.
The operator payoff is simple: fewer seams. Fewer hidden defaults. Fewer “this one environment is special” surprises.
Two planes, one product: control plane vs site plane
One of Upright’s most practical decisions is that it encodes topology into routing, not just into deployment docs.
Upright expresses the split between a human control plane and per-site execution planes with subdomains:
app.example.com -> control plane
sfo.example.com -> site execution + evidence
Why this matters in practice:
- It reduces operator confusion. “Go to app.” is a clean instruction at 2 a.m.
- It keeps responsibility legible. Admin/auth/proxy in one place; results and evidence by site.
- It helps failure isolation. One site can be unhealthy without turning the whole product into a guessing game.
Runtime identity: “this node is SFO” without hardcoding
Distributed checks only help if you can trust which node produced which signal.
Upright treats “site identity” as runtime configuration, not as something you bake into classes. You define sites in a YAML config, and each deployed node resolves “who am I?” from deployment context (an environment variable or tag). That site identity can then flow into metrics, traces, and stored evidence without every probe reinventing tagging.
And it changes how you debug. When a probe fails, you’re not asking “is this the NYC node or the SFO node?” You’re asking “why is SFO failing while NYC passes?” That’s the right question.
It also creates a new discipline: treat site config and deployment tags as control-plane configuration. If those are wrong, your monitoring system can be “working” while lying about where evidence came from.
Opinionated defaults (the part most gems avoid)
Upright isn’t shy about shipping defaults that look like production.
A few of the opinions are worth calling out because they show the design intent:
- Solid Queue is the job default. The scheduler and execution plane are part of the product.
- Mission Control Jobs is there when things jam. You can see what’s running and what’s retrying.
- Prometheus + OpenTelemetry are first-class. Metrics and traces aren’t bolt-ons.
- Playwright is treated as a real probe type. Not just “check the homepage,” but “check the flow.”
- Extension points exist without patching internals. Add app-specific probes/authenticators as code and keep the engine itself clean.
Authentication: bootstrap-simple, production-upgradable
Upright’s authentication approach is intentionally simple at first.
There’s a default static-credentials strategy that gets you to a working admin sign-in without needing an external identity provider on day one. It uses safe comparison mechanics to avoid the most obvious mistakes, but it’s still what it sounds like: a bootstrap.
It’s a reasonable trade for a self-hosted product. But a bootstrap path is only safe if teams upgrade it. The production move is to switch to a stronger provider (OIDC) and rotate every default secret before the system carries real incident weight.
Tradeoffs and failure modes to plan for
Upright’s architecture is practical, but it’s not magic. A few watch-outs are worth planning for up front:
- Shared cookie domain simplifies auth across subdomains, but broadens scope. Misconfigured hostnames can create confusing session behavior.
- Wrong site identity can mislabel the system. If a node thinks it’s “SFO” when it’s actually “NYC,” you’ll waste time chasing the wrong region.
- Dynamic loading of probes/auth code can fail at boot. If you add custom probes and make a mistake, errors show up at runtime, not “compile time.”
- Static credentials are not a long-term answer. The bootstrap is convenient, but production needs stronger auth and real secret management.
None of these are deal-breakers. They’re just the kinds of failure modes you only get when you’re shipping a real system, not a library.
Practical best practices if you deploy Upright
Here’s the checklist I’d start with:
- Upgrade auth early. Move to OIDC before Upright becomes critical.
- Validate site config in CI. Treat
sites.ymland deployment tags as control-plane config and fail builds when they drift. - Make host allowlists explicit per environment. Avoid “it worked locally” redirect/session surprises.
- Rotate secrets before first real deploy. Don’t carry installer defaults into production.
- Decide retention intentionally. Especially if you run browser probes that capture logs and video evidence; storage grows fast.
If you do nothing else, lock in site identity correctness. It’s the foundation that makes distributed checks useful instead of noisy.
Upright is interesting because it’s packaged like a product, not a snippet: a Rails engine with boundaries, a subdomain topology that matches how operators think, and defaults that assume real monitoring needs (jobs, evidence, and observability).
The subdomain split is the architectural “tell.” It encodes the difference between the human control plane and the site execution plane in a way that stays readable under stress.
Next in the series: how Upright keeps the probe pipeline consistent across HTTP, SMTP, Playwright, and traceroute while still capturing evidence you can actually use.
Your AI-built MVP, made production-ready.
Free 15-min call. Paid diagnostic. 1-week sprint with real fixes in production — not a PDF of recommendations.
