Architecture

Causely is built on a split-architecture model that balances local control with cloud-powered intelligence. This design ensures low overhead, strong data privacy, and seamless integration with existing tools.

This page provides detailed information about Causely's deployment architecture and component structure. For a high-level overview of how Causely's causal reasoning engine works, see How Causely Works.

System Architecture

Deployment Architecture

Causely can be deployed across various environments, including Kubernetes clusters, standalone Docker hosts, Nomad clusters, and more. The deployment architecture consists of several components that work together to provide real-time root cause analysis.

Mediation Layer

The mediation layer is deployed locally in your infrastructure and processes telemetry data to extract only the signals needed for reasoning. It performs:

Symptom Detection: Converts telemetry from Prometheus, CloudWatch, Datadog, OpenTelemetry (including eBPF), and other sources into a binary stream of active/inactive symptoms.
Topology Discovery and Ingestion: Leverages integrated telemetry sources to discover entities and dependencies and ingest topology from systems such as OpenTelemetry, cloud provider APIs and other sources.
Local Processing: Processes telemetry locally to minimize data transfer, control cost, and preserve privacy. Most raw telemetry remains local, with only distilled insights and targeted evidence sent to the cloud.

The mediation layer primarily sends distilled insights to the cloud. After a root cause is identified, a targeted subset of relevant telemetry (metrics, traces, and log-derived errors/events) may be sent as evidence to enhance root cause clarity. For more detail on supported telemetry sources, see Supported Telemetry.

The mediation layer consists of the following components:

Mediator

The Mediator is the core component that runs locally in your environment and serves as the data processing layer:

Symptom Detection: Converts telemetry from various sources into binary symptom states
Topology Discovery: Automatically discovers services, infrastructure, and dependencies
Local Processing: Processes telemetry locally, with most raw telemetry remaining in your datacenter
OTLP Endpoint: Listens on port 4317 for OpenTelemetry Protocol data

The Mediator handles secure communication with Causely's cloud-based causal reasoning engine, primarily sending distilled insights. After root causes are identified, a targeted subset of relevant telemetry may be sent as evidence to enhance root cause clarity.

The Mediator can also be optionally configured to get metrics from Prometheus or discover and monitor managed cloud services from cloud providers.

Agents

Agents are deployed across your infrastructure to gather node and container level metrics. The deployment method varies depending on your environment:

Kubernetes: Agents are deployed as a DaemonSet across all nodes in the cluster
Docker: Agents run as containers on standalone Docker hosts
Nomad: Agents are deployed as Nomad jobs across the cluster

Agents leverage eBPF technology, which requires privileged access to the host system. This enables automatic instrumentation without code changes. The eBPF instrumentation uses uprobes exclusively to intercept specific user-space functions within your applications—it does not hook into kernel networking callbacks or the packet datapath, and does not act as a Container Network Interface (CNI) or network infrastructure component. The instrumentation may inject trace context headers for distributed tracing, but does not intercept, block, or route network traffic at the kernel level.

Agents don't establish any outbound connections to the internet or any other service apart from the Mediator and VictoriaMetrics. The agents periodically forward the topology and manifestation data to the Mediator, which, in turn, sends it to the Causely SaaS backend for analysis. If an agent fails or is removed, your applications and network continue to function normally.

Agent Architecture

Executor

The Executor is an optional component responsible for executing remediation actions within your infrastructure. The Executor can be enabled as part of the deployment process.

The specific permissions required depend on your deployment environment:

Kubernetes: The Executor's ServiceAccount is granted the cluster-admin role
Docker/Nomad: The Executor requires appropriate permissions to execute remediation actions

VictoriaMetrics

VictoriaMetrics is a timeseries database used by the agents and mediator (on port: 8428) to store additional timeseries data locally in your environment.

Causal Engine

The Causal Engine runs in Causely's secure cloud environment (or can be self-hosted). It receives the stream of symptom states and performs real-time analysis using probabilistic modeling, system graphs, and causal inference. It infers causes, evaluates blast radius, validates constraints, and prioritizes remediation, all without requiring manual correlation.

Telemetry Sources

Causely supports a wide range of telemetry sources, including OpenTelemetry, Prometheus, CloudWatch, Datadog, and more. For a full list of supported telemetry sources, see Supported Telemetry.

Causely Agents are deployed in your infrastructure and are responsible for collecting the telemetry data from those sources.

By default Causely will automatically instrument your applications to receive OpenTelemetry traces. This allows Causely to discover service dependencies, monitor sync and async communication signals. Additionally you can export traces to Causely from your existing OpenTelemetry Collectors. We recommend that you always send OpenTelemetry traces to Causely, as this allows Causely to provide cross-service insights.

Workflow Integration

Causely integrates directly into your tools of choice, delivering causal insights into Slack, Alertmanager, Opsgenie, Grafana, and more.

For details on how to connect Causely to your workflows, see Supported Workflows.

This architecture allows Causely to deliver precise, real-time insights without burdening your data pipelines or violating privacy requirements.

Security Considerations

For detailed information about security, permissions, and data handling, see the Security documentation.

System Architecture​

Deployment Architecture​

Mediation Layer​

Mediator​

Agents​

Agent Architecture​

Executor​

VictoriaMetrics​

Causal Engine​

Telemetry Sources​

Workflow Integration​

Security Considerations​