Skip to main content

How Causely Works

Causely uses a model-driven reasoning engine to continuously infer root causes for symptoms observed in production, without requiring full telemetry ingestion. It works alongside your existing telemetry sources, interpreting the meaning behind the metrics, logs, and traces they provide, rather than replacing them.

This page explains how Causely works: the ontology-first approach that turns raw telemetry into higher abstractions, then how the causal reasoning engine uses those abstractions to infer root cause and impact. For deployment architecture, components, and infrastructure, see Architecture.

How the Causal Engine Works

At the core of Causely is a probabilistic reasoning engine that maps symptoms to root causes using domain-specific models and dynamic system knowledge. It maintains three core data structures—the Topology Graph, the Causality Graph (with its Codebook), and the Attribute Dependency Graph—and uses them for analysis. The engine is composed of six interdependent components, described below.

1. From telemetry to semantic understanding

Instead of feeding raw telemetry straight into analysis, Causely uses an ontology-first approach: mediation distills telemetry (logs, metrics, traces) and other data locally into a structured layer of:

  • Entities: services, pods, databases, queues
  • Relations: how they connect
  • Symptoms: observable states such as "high latency here" or "errors increasing there"; detected when thresholds or patterns are met and used for reasoning even when no alert is fired

That distillation turns the vast volumes of data teams commonly ingest for observability into higher-quality abstractions. This distilled data is sent to the backend, where the causal reasoning engine infers the causes explaining those symptoms.

Local processing and privacy

All distillation runs in your environment. The mediation layer turns telemetry into symptom states and topology; only this distilled data is sent to the causal reasoning engine. Raw logs, full traces, and bulk metrics stay in your datacenter—only semantic state and, when needed, minimal evidence leave your environment. This design keeps sensitive and high-volume telemetry local and preserves privacy.

2. Ontology: causal and attribute models

The engine’s ontology is the formal model of entities, behaviors, and failure modes used for inference. It is composed of two foundational models: the Causal Model and the Attribute Dependency Model. Together, they define what root causes and symptoms exist and how they relate, forming the semantic backbone of the causal reasoning engine.

  • Causal Model: Causely includes a built-in library of causal knowledge that captures root causes capable of degrading application performance. This model requires no configuration and enables the system to begin identifying root causes as soon as it is deployed.

    • Covers a broad range of entities, including applications, databases, caches, messaging systems, load balancers, DNS, compute, and storage.
    • Encodes how each root cause propagates through the environment and the symptoms it may produce.
    • Is designed to be environment-agnostic and applicable to any modern cloud native architecture.
  • Attribute Dependency Model: The attribute model extends the causal model by capturing how performance-related attributes (for example, latency, throughput, utilization) are interdependent across entities. It also encodes the operational constraints those attributes must satisfy to meet performance goals.

    • Represents attribute dependencies across a wide range of services and infrastructure layers.
    • Supports both predefined and learned functional relationships between attributes.
    • Defines the desired state of the system based on application goals and constraints.
    • Like the causal model, it is fully environment-independent and generalizable across architectures.

3. Topology Graph

Causely continuously discovers and maintains the topology graph of services, infrastructure, and their interconnections via its integrations with your existing telemetry sources. It ingests and reconciles topology from any source, including OpenTelemetry traces, cloud provider APIs and other inventories.

Cloud native environments are highly dynamic, composed of applications, services, databases, caches, messaging systems, load balancers, compute, storage, and more.

Through continuous discovery and ingestion from these sources, Causely determines for each entity:

  • Connectivity: which other entities it communicates with horizontally
  • Layering: which entities it is built upon or supports vertically
  • Composition: the internal components or resources that make up the entity

These relationships are stitched together into a continuously updated, real-time graph that represents the full system topology, regardless of whether data comes from OpenTelemetry, cloud APIs or other integrated sources.

This graph forms the foundation for blast radius analysis, root cause attribution, and cross-service impact modeling.

4. Causality Mapping

Causely automatically generates a Bayesian network that models how root causes lead to observable symptoms, based on its built-in Causal Models and the real-time Topology Graph.

This causal mapping encodes probabilistic cause-effect relationships, allowing the system to infer the most likely root cause from a given set of active symptoms. It reflects both the structural dependencies in your environment and learned patterns of failure propagation.

Causely represents this mapping through two core data structures:

  • Causality Graph: A directed acyclic graph (DAG) where nodes represent root causes and symptoms, and edges denote potential causal relationships. Each edge is weighted with a probability, indicating the likelihood that one node (a root cause) leads to another (a symptom).
  • Codebook: A table where each column corresponds to a root cause and each row to a symptom. Each column is a vector of probabilities defining a unique signature of the root cause. A cell in the vector represents the probability that the root cause may cause the symptom.

Together, these structures power Causely’s ability to perform real-time probabilistic inference and deliver explainable, high-confidence root cause insights.

5. Attribute Dependency Graph

Causely generates this graph using its built-in Attribute Dependency Model and the live Topology Graph. The result is a directed acyclic graph (DAG) that models functional dependencies between system attributes.

In this graph:

  • Nodes represent individual attributes, for example CPU usage of a service or queue length of a messaging system.
  • Edges represent dependency relationships, for example an edge from attribute A to attribute B means that B is a function of A.
  • Edge labels define these functions. Some may be explicitly defined in the Attribute Dependency Model, while others are learned automatically from observed behavior in your environment.
  • Nodes representing attributes that must satisfy a constraint are decorated with the constraint the attribute must satisfy.

The Attribute Dependency Graph enables Causely to reason about how changes in one part of the system cascade across others, identify emerging bottlenecks, and validate whether the environment remains within defined performance bounds.

6. Analysis

The analysis automatically pinpoints causes in real time based on observed symptoms, using the Codebook described above. No configuration is required, Causely can immediately identify a broad set of issues (100+ causes mapped to symptoms) ranging from application malfunctions to service congestion to infrastructure bottlenecks.

In any given environment, there can be tens of thousands of different causes that may cause hundreds of thousands of symptoms. Causely prevents service degradation by detangling this mess and pinpointing the cause putting your SLOs at risk and driving remediation actions before SLOs are violated.

With this setup in place, you can achieve your goals:

Deployment architecture

For deployment architecture, component structure, and workflow integration, see Architecture.