Skip to main content

Observability

Evolve is a distributed system with many moving parts. When something goes wrong, you need to trace a request across services to find the problem. Evolve builds observability into the platform from the start, using OpenTelemetry as the standard instrumentation layer.

OpenTelemetry

Every service in Evolve is instrumented with OpenTelemetry (OTel). This provides distributed tracing, metrics, and logging through a vendor-agnostic standard. Because OTel is an open standard, you can send telemetry data to any compatible backend.

Evolve has production deployments running on:

  • Honeycomb: APM and distributed tracing
  • Sentry: error tracking and performance monitoring
  • Google Cloud Trace: trace visualization on GCP
  • Azure Monitor: tracing and metrics on Azure

Other OTel-compatible services (Datadog, New Relic, AWS X-Ray, Grafana, and others) work out of the box.

How it works

Each service exports traces and metrics to an OpenTelemetry Collector, which forwards them to your chosen observability backend:

Services send telemetry using the OTLP protocol (gRPC on port 4317 or HTTP on port 4318). The collector handles batching, sampling, and routing to one or more backends.

Local development

For local development, Evolve includes Jaeger and an OpenTelemetry Collector in the docker-compose.yml. Jaeger provides a web UI (port 16686) for viewing traces across services, giving you the same distributed tracing experience locally that you have in production.

Sentry

While OpenTelemetry handles distributed tracing and metrics, Sentry provides error tracking and performance monitoring. Evolve includes a deep Sentry integration across both frontend and backend, connected to OpenTelemetry through @sentry/opentelemetry.

Observability package

The @evolve-packages/observability package wraps Sentry initialization together with OpenTelemetry setup. Every service calls initObservability() at startup, which configures both systems with consistent sampling and context propagation:

import { initObservability } from "@evolve-packages/observability";

initObservability();

Under the hood, this:

  1. Calls configureSentry() with integrations for HTTP, uncaught exceptions, console capture, request data, and frame rewriting
  2. Configures OpenTelemetry with a SentrySampler so traces are sampled consistently across both systems, and a SentryContextManager for trace context propagation
  3. Sets up a pino logger stream that forwards error-level logs to Sentry, enriched with logger context (trace IDs, store context). Sensitive fields (authorization headers, cookies, emails, passwords) are redacted before sending.

Frontend

The Next.js storefront uses @sentry/nextjs with three initialization points:

  • Server (instrumentation.server.ts): full tracing with SentrySampler for OpenTelemetry integration, and console capture at error/critical levels
  • Edge (instrumentation.edge.ts): minimal initialization with tracing and replays disabled
  • Client (sentry.client.config.ts): reads DSN, release, and environment from sentry:dsn, sentry:release, and sentry:environment meta tags injected by the root layout at runtime

The meta tag approach means the client SDK picks up the correct DSN and environment without exposing values in the JavaScript bundle. The client also extracts the user ID from the session JWT in cookies and sets it on the Sentry scope.

A CSP tunnel route (/api/capture-errors) is configured through withSentryConfig() so error reports bypass content security policy restrictions and ad blockers. Source maps are uploaded during the build and deleted afterward. CSP violation reports are also sent to Sentry through a Report-To header.

React errors are captured through a custom error boundary wrapping react-error-boundary that calls captureReactException() for full component stack diagnostics. The boundary accepts a captureSentry prop to optionally disable reporting.

Backend

Backend services get Sentry integration through initObservability(), which every service calls at startup. For Lambda-based services, lambdaHandlerFactory() wraps handlers with both Sentry and OpenTelemetry instrumentation and creates spans with Lambda-specific attributes (request ID, function ARN, version).

The pino logger is configured with a Sentry stream destination. Any log at error level or above is forwarded to Sentry as either a captureException (for errors) or captureMessage (for messages), enriched with the logger's bindings (trace IDs, store context, service name).

Configuration

Sentry is configured through the Mach Composer sentry plugin. The auth token is stored as a SOPS-encrypted secret, and the organization, project, and rate limits are set as global config:

plugins:
sentry:
source: mach-composer/sentry
version: 0.1.3

global:
sentry:
auth_token: ${var.secrets.sentry.auth_token}
organization: "lab-digital"
project: "evolve"
rate_limit_window: 3600
rate_limit_count: 1000

Each component that needs Sentry lists it in its integrations. The plugin provides the sentry_dsn Terraform variable to each component's module, which is then mapped to environment variables:

VariablePurpose
SENTRY_DSNSentry project DSN. If not set, Sentry is disabled.
SENTRY_ENVIRONMENTEnvironment name (falls back to ENVIRONMENT)
SERVICE_NAMEService identifier for tagging

Core Web Vitals monitoring

Because Evolve already exports frontend telemetry through OpenTelemetry, you can extend the instrumentation to capture Core Web Vitals in real time. In e-commerce, these metrics directly impact conversion rates and search engine rankings.

With this approach you get:

  • Per-page metrics: see which pages have performance issues
  • Release impact: track how deployments affect performance
  • Instant alerts: respond to performance regressions immediately instead of waiting for Google Analytics data (which can be delayed up to 48 hours)
info

While Core Web Vitals can be tracked through Google Analytics or Search Console, the data is delayed by up to 48 hours. For e-commerce, where a performance regression can reduce conversion immediately, real-time monitoring through your own observability stack lets you detect and fix issues before they impact revenue.

Further reading

Honeycomb has published a detailed guide on implementing Core Web Vitals monitoring with OpenTelemetry: