Skip to main content

How Causely Works

Causely uses a model-driven reasoning engine to continuously infer root causes for symptoms observed in production, without requiring full telemetry ingestion. It works alongside your existing observability stack, interpreting the meaning behind the metrics, logs, and traces those tools surface, rather than replacing them.

This page provides a high-level overview of how Causely's causal reasoning engine works, focusing on the core concepts and mechanisms. For detailed information about deployment architecture, components, and infrastructure, see the Architecture documentation.

How the Causal Engine Works

At the core of Causely is a probabilistic reasoning engine that maps symptoms to root causes using domain-specific models and dynamic system knowledge. The engine is composed of five interdependent components.

1. Ontology

Causely’s ontology is composed of two foundational models: the Causal Model and the Attribute Dependency Model. Together, they define the entities, behaviors, and failure modes in your environment, forming the semantic backbone of the causal reasoning engine.

  • Causal Model: Causely includes a built-in library of causal knowledge that captures root causes capable of degrading application performance. This model require no configuration and enables the system to begin identifying root causes as soon as it is deployed.

    • Covers a broad range of entities, including applications, databases, caches, messaging systems, load balancers, DNS, compute, and storage.
    • Encodes how each root cause propagates through the environment and the symptoms it may produce.
    • Is designed to be environment-agnostic and applicable to any modern cloud native architecture.
  • Attribute Dependency Model: The attribute model extends the causal model by capturing how performance-related attributes (for example, latency, throughput, utilization) are interdependent across entities. It also encodes the operational constraints those attributes must satisfy to meet performance goals.

    • Represents attribute dependencies across a wide range of services and infrastructure layers.
    • Supports both predefined and learned functional relationships between attributes.
    • Defines the desired state of the system based on application goals and constraints.
    • Like the causal model, it is fully environment-independent and generalizable across architectures.

2. Topology Graph

Causely continuously discovers and maintains the topology graph of services, infrastructure, and their interconnections via its integrations with your existing telemetry sources. It ingests and reconciles topology from any source, including OpenTelemetry traces, cloud provider APIs and other inventories.

Cloud native environments are highly dynamic, composed of applications, services, databases, caches, messaging systems, load balancers, compute, storage, and more.

Through continuous discovery and ingestion from these sources, Causely determines for each entity:

  • Connectivity: which other entities it communicates with horizontally
  • Layering: which entities it is built upon or supports vertically
  • Composition: the internal components or resources that make up the entity

These relationships are stitched together into a continuously updated, real-time graph that represents the full system topology, regardless of whether data comes from OpenTelemetry, cloud APIs or other integrated sources.

This graph forms the foundation for blast radius analysis, root cause attribution, and cross-service impact modeling.

3. Causality Mapping

Causely automatically generates a Bayesian network that models how root causes lead to observable symptoms, based on its built-in Causal Models and the real-time Topology Graph.

This causal mapping encodes probabilistic cause-effect relationships, allowing the system to infer the most likely root cause from a given set of active symptoms. It reflects both the structural dependencies in your environment and learned patterns of failure propagation.

Causely represents this mapping through two core data structures:

  • Causality Graph: A directed acyclic graph (DAG) where nodes represent root causes and symptoms, and edges denote potential causal relationships. Each edge is weighted with a probability, indicating the likelihood that one node (a root cause) leads to another (a symptom).
  • Codebook: A table where each column corresponds to a root cause and each row to a symptom. Each column is a vector of probabilities defining a unique signature of the root cause. A cell in the vector represents the probability that the root cause may cause the symptom.

Together, these structures power Causely’s ability to perform real-time probabilistic inference and deliver explainable, high-confidence root cause insights.

4. Attribute Dependency Graph

Causely generates this graph using its built-in Attribute Dependency Model and the live Topology Graph. The result is a directed acyclic graph (DAG) that models functional dependencies between system attributes.

In this graph:

  • Nodes represent individual attributes, for example CPU usage of a service or queue length of a messaging system.
  • Edges represent dependency relationships, for example an edge from attribute A to attribute B means that B is a function of A.
  • Edge labels define these functions. Some may be explicitly defined in the Attribute Dependency Model, while others are learned automatically from observed behavior in your environment.
  • Nodes representing attributes that must satisfy a constraint are decorated with the constraint the attribute must satisfy.

The Attribute Dependency Graph enables Causely to reason about how changes in one part of the system cascade across others, identify emerging bottlenecks, and validate whether the environment remains within defined performance bounds.

5. Analysis

The analysis automatically pinpoints causes in real time based on observed symptoms, using the Codebook described above. No configuration is required, Causely can immediately identify a broad set of issues (100+ causes mapped to symptoms) ranging from application malfunctions to service congestion to infrastructure bottlenecks.

In any given environment, there can be tens of thousands of different causes that may cause hundreds of thousands of symptoms. Causely prevents service degradation by detangling this mess and pinpointing the cause putting your SLOs at risk and driving remediation actions before SLOs are violated.

How Causely Processes Data

Causely is built on a split-architecture model that balances local control with cloud-powered intelligence. The system processes telemetry data through a mediation layer deployed locally in your infrastructure, which converts telemetry into symptom states and topology information. These distilled insights are sent to the Causal Engine (running in Causely's cloud environment or self-hosted), which performs real-time analysis using the five core components described above.

The mediation layer processes telemetry locally to minimize data transfer, control cost, and preserve privacy. Most raw telemetry remains local, with only distilled insights and targeted evidence sent to the cloud. After a root cause is identified, a targeted subset of relevant telemetry (metrics, traces, and log-derived errors/events) may be sent as evidence to enhance root cause clarity.

For detailed information about deployment architecture, component structure, telemetry sources, and workflow integration, see the Architecture documentation.