Skip to main content

How Causely Works

Causely uses a model-driven reasoning engine to continuously infer root causes for symptoms observed in production, without requiring full telemetry ingestion. It works alongside your existing observability stack, interpreting the meaning behind the metrics, logs, and traces those tools surface, rather than replacing them.

This page explains how Causely transforms high-volume observability signals into real-time, explainable insights using probabilistic modeling, system graphs, and causal inference.

How the Causal Engine Works

At the core of Causely is a probabilistic reasoning engine that maps symptoms to root causes using domain-specific models and dynamic system knowledge. The engine is composed of five interdependent components.

1. Ontology

Causely’s ontology is composed of two foundational models: the Causal Model and the Attribute Dependency Model. Together, they define the entities, behaviors, and failure modes in your environment, forming the semantic backbone of the causal reasoning engine.

  • Causal Model: Causely includes a built-in library of causal knowledge that captures common root causes capable of degrading application performance. This model require no configuration and enables the system to begin identifying root causes as soon as it is deployed.

    • Covers a broad range of entities, including applications, databases, caches, messaging systems, load balancers, DNS, compute, and storage.
    • Encodes how each root cause propagates through the environment and the symptoms it may produce.
    • Is designed to be environment-agnostic and applicable to any modern cloud native architecture.
  • Attribute Dependency Model: The attribute model extends the causal model by capturing how performance-related attributes (for example, latency, throughput, utilization) are interdependent across entities. It also encodes the operational constraints those attributes must satisfy to meet performance goals.

    • Represents attribute dependencies across a wide range of services and infrastructure layers.
    • Supports both predefined and learned functional relationships between attributes.
    • Defines the desired state of the system based on application goals and constraints.
    • Like the causal model, it is fully environment-independent and generalizable across architectures.

2. Topology Graph

Causely automatically constructs the topology graph through discovery of services, infrastructure, and their interconnections. Cloud native environments are highly dynamic, composed of applications, services, databases, caches, messaging systems, load balancers, compute, storage, and more.

Causely continuously scans your environment to identify all these entities and determines for each:

  • Connectivity: which other entities it communicates with horizontally
  • Layering: which entities it is built upon or supports vertically
  • Composition: the internal components or resources that make up the entity

These relationships are stitched together into a continuously updated, real-time graph that represents the full system topology. This graph forms the foundation for blast radius analysis, root cause attribution, and cross-service impact modeling.

3. Causality Mapping

Causely automatically generates a Bayesian network that models how root causes lead to observable symptoms, based on its built-in Causal Models and the real-time Topology Graph.

This causal mapping encodes probabilistic cause-effect relationships, allowing the system to infer the most likely root cause from a given set of active symptoms. It reflects both the structural dependencies in your environment and learned patterns of failure propagation.

Causely represents this mapping through two core data structures:

  • Causality Graph: A directed acyclic graph (DAG) where nodes represent root causes and symptoms, and edges denote potential causal relationships. Each edge is weighted with a probability, indicating the likelihood that one node (a root cause) leads to another (a symptom).
  • Codebook: A table where each column corresponds to a root cause and each row to a symptom. Each column is a vector of probabilities defining a unique signature of the root cause. A cell in the vector represents the probability that the root cause may cause the symptom.

Together, these structures power Causely’s ability to perform real-time probabilistic inference and deliver explainable, high-confidence root cause insights.

4. Attribute Dependency Graph

Causely generates this graph using its built-in Attribute Dependency Model and the live Topology Graph. The result is a directed acyclic graph (DAG) that models functional dependencies between system attributes.

In this graph:

  • Nodes represent individual attributes, for example CPU usage of a service or queue length of a messaging system.
  • Edges represent dependency relationships, for example an edge from attribute A to attribute B means that B is a function of A.
  • Edge labels define these functions. Some may be explicitly defined in the Attribute Dependency Model, while others are learned automatically from observed behavior in your environment.
  • Nodes representing attributes that must satisfy a constraint are decorated with the constraint the attribute must satisfy.

The Attribute Dependency Graph enables Causely to reason about how changes in one part of the system cascade across others, identify emerging bottlenecks, and validate whether the environment remains within defined performance bounds.

5. Root Cause Analysis

The root cause analysis automatically pinpoints root causes in real time based on observed symptoms, using the Codebook described above. No configuration is required, Causely can immediately identify a broad set of issues (100+ root causes) ranging from application malfunctions to service congestion to infrastructure bottlenecks.

In any given environment, there can be tens of thousands of different root causes that may cause hundreds of thousands of symptoms. Causely prevents service degradation by detangling this mess and pinpointing the root cause putting your SLOs at risk and driving remediation actions before SLOs are violated.

Deployment Architecture

Causely is built on a split-architecture model that balances local control with cloud-powered intelligence. This design ensures low overhead, strong data privacy, and seamless integration with existing tools.

Mediation Layer

The mediation layer is deployed locally in your infrastructure and is responsible for collecting only the signals needed for reasoning, never full logs or raw metrics. It performs:

  • Symptom Detection: Converts telemetry from Prometheus, CloudWatch, Datadog, OpenTelemetry (including eBPF), and other sources into a binary stream of active/inactive symptoms.
  • Topology Discovery: Automatically discovers the entities and dependencies in your environment.
  • Local Processing: Keeps all raw telemetry local to minimize data transfer, control cost, and preserve privacy.

No raw data is sent to the cloud, only distilled insights. For more detail on supported telemetry sources, see Supported Telemetry.

Causal Engine

The Causal Engine runs in Causely's secure cloud environment (or can be self-hosted). It receives the stream of symptom states and performs real-time analysis using the five core components described above. It infers the most probable root causes, evaluates blast radius, validates constraints, and prioritizes remediation, all without requiring manual correlation.

Workflow Integration

Causely integrates directly into your tools of choice, delivering root cause insights into Slack, Alertmanager, Opsgenie, Grafana, and more.

For details on how to connect Causely to your workflows, see Supported Workflows.

This architecture allows Causely to deliver precise, real-time insights without burdening your data pipelines or violating privacy requirements.