Observability
Evolve is a distributed system with many moving parts. When something goes wrong, you need to trace a request across services to find the problem. Evolve builds observability into the platform from the start, using OpenTelemetry as the standard instrumentation layer.
OpenTelemetry
Every service in Evolve is instrumented with OpenTelemetry (OTel). This provides distributed tracing, metrics, and logging through a vendor-agnostic standard. Because OTel is an open standard, you can send telemetry data to any compatible backend.
Evolve has production deployments running on:
- Honeycomb: APM and distributed tracing
- Sentry: error tracking and performance monitoring
- Google Cloud Trace: trace visualization on GCP
- Azure Monitor: tracing and metrics on Azure
Other OTel-compatible services (Datadog, New Relic, AWS X-Ray, Grafana, and others) work out of the box.
How it works
Each service exports traces and metrics to an OpenTelemetry Collector, which forwards them to your chosen observability backend:
Services send telemetry using the OTLP protocol (gRPC on port 4317 or HTTP on port 4318). The collector handles batching, sampling, and routing to one or more backends.
Local development
For local development, Evolve includes Jaeger and an OpenTelemetry
Collector in the docker-compose.yml. Jaeger provides a web UI
(port 16686) for viewing traces across services, giving you the same
distributed tracing experience locally that you have in production.
Sentry
While OpenTelemetry handles distributed tracing and metrics, Sentry
provides error tracking and performance monitoring. Evolve includes a
deep Sentry integration across both frontend and backend, connected to
OpenTelemetry through @sentry/opentelemetry.
Observability package
The @evolve-packages/observability package wraps Sentry
initialization together with OpenTelemetry setup. Every service calls
initObservability() at startup, which configures both systems with
consistent sampling and context propagation:
import { initObservability } from "@evolve-packages/observability";
initObservability();
Under the hood, this:
- Calls
configureSentry()with integrations for HTTP, uncaught exceptions, console capture, request data, and frame rewriting - Configures OpenTelemetry with a
SentrySamplerso traces are sampled consistently across both systems, and aSentryContextManagerfor trace context propagation - Sets up a pino logger stream that forwards error-level logs to Sentry, enriched with logger context (trace IDs, store context). Sensitive fields (authorization headers, cookies, emails, passwords) are redacted before sending.
Frontend
The Next.js storefront uses @sentry/nextjs with three initialization
points:
- Server (
instrumentation.server.ts): full tracing withSentrySamplerfor OpenTelemetry integration, and console capture at error/critical levels - Edge (
instrumentation.edge.ts): minimal initialization with tracing and replays disabled - Client (
sentry.client.config.ts): reads DSN, release, and environment fromsentry:dsn,sentry:release, andsentry:environmentmeta tags injected by the root layout at runtime
The meta tag approach means the client SDK picks up the correct DSN and environment without exposing values in the JavaScript bundle. The client also extracts the user ID from the session JWT in cookies and sets it on the Sentry scope.
A CSP tunnel route (/api/capture-errors) is configured through
withSentryConfig() so error reports bypass content security policy
restrictions and ad blockers. Source maps are uploaded during the build
and deleted afterward. CSP violation reports are also sent to Sentry
through a Report-To header.
React errors are captured through a custom error boundary wrapping
react-error-boundary that calls captureReactException() for full
component stack diagnostics. The boundary accepts a captureSentry
prop to optionally disable reporting.
Backend
Backend services get Sentry integration through initObservability(),
which every service calls at startup. For Lambda-based services,
lambdaHandlerFactory() wraps handlers with both Sentry and
OpenTelemetry instrumentation and creates spans with Lambda-specific
attributes (request ID, function ARN, version).
The pino logger is configured with a Sentry stream destination. Any log
at error level or above is forwarded to Sentry as either a
captureException (for errors) or captureMessage (for messages),
enriched with the logger's bindings (trace IDs, store context, service
name).
Configuration
Sentry is configured through the Mach Composer sentry plugin. The
auth token is stored as a SOPS-encrypted secret, and the organization,
project, and rate limits are set as global config:
plugins:
sentry:
source: mach-composer/sentry
version: 0.1.3
global:
sentry:
auth_token: ${var.secrets.sentry.auth_token}
organization: "lab-digital"
project: "evolve"
rate_limit_window: 3600
rate_limit_count: 1000
Each component that needs Sentry lists it in its integrations. The
plugin provides the sentry_dsn Terraform variable to each component's
module, which is then mapped to environment variables:
| Variable | Purpose |
|---|---|
SENTRY_DSN | Sentry project DSN. If not set, Sentry is disabled. |
SENTRY_ENVIRONMENT | Environment name (falls back to ENVIRONMENT) |
SERVICE_NAME | Service identifier for tagging |
Core Web Vitals monitoring
Because Evolve already exports frontend telemetry through OpenTelemetry, you can extend the instrumentation to capture Core Web Vitals in real time. In e-commerce, these metrics directly impact conversion rates and search engine rankings.
With this approach you get:
- Per-page metrics: see which pages have performance issues
- Release impact: track how deployments affect performance
- Instant alerts: respond to performance regressions immediately instead of waiting for Google Analytics data (which can be delayed up to 48 hours)
While Core Web Vitals can be tracked through Google Analytics or Search Console, the data is delayed by up to 48 hours. For e-commerce, where a performance regression can reduce conversion immediately, real-time monitoring through your own observability stack lets you detect and fix issues before they impact revenue.
Further reading
Honeycomb has published a detailed guide on implementing Core Web Vitals monitoring with OpenTelemetry: