Terminology

This page contains the terminology used in Causely. The list is not exhaustive, but rather a guide to help you understand the key concepts. Terms are not listed in alphabetical order, but rather grouped by their relatedness.

Managed Object

An object in the managed environment.

Managed Domain

A scoped collection of managed objects.

Causely Domain Manager (CDM)

A Causely instance that manages a managed domain.

Event

An observable anomaly in a managed object.
An event is TRUE if the anomaly is observed/present/active.
Examples
- HighCPUUtilization
  - CPU Utilization > 90%
- Container OOM Killed
  - An external notification

Active/Present Event

An event that is TRUE.

Root Cause (RC)

Root Cause is something that may occur in a managed object.
Root Cause is inferred based on observed events.
Examples
- Database Congested
- Application Malfunction
- Container CPU Congested
- etc.

Symptom

An event that may be caused by a root cause.
Examples
- Container HighCPUUtilization may be caused by Container Congested
- Service HighLatency may be caused by Database Congested
- Service HighErrorRate may be caused by Application Malfunction
- etc.

Local Symptom of Root Cause R

A symptom that is observable in the managed object in which the root cause $R$ occurs.
Examples
- A Container X HighCPUUtilization is a local symptom of Container X Congested
- etc.

Propagated Symptom of Root Cause R

A symptom that may be caused by root cause $R$ observed in a managed object related to the managed object in which root cause $R$ occurs.
Examples
- Service Starvation is a propagated symptom of Database Congested propagating to the services that are accessing the database

Logs Associated with Root Cause

Causely captures logs from the containers that underlie your services to provide rich context for understanding issues. Each container (within its pod) writes logs—including errors and exceptions—to stdout. When Causely detects a service malfunction or identifies a root cause, it automatically collects and surfaces relevant log lines from the underlying container.

These logs are shown in two places:

Under the affected service, when the service is exhibiting abnormal behavior (for example, error spikes, degraded performance).
Alongside a root cause, to highlight the precise errors or stack traces that occurred around the time of failure.

This helps validate the issue and dramatically shortens time to understanding and resolution.

Root Cause Analysis Problem (RCA)

Given a set of Symptoms (manifestations) identify the explanation of why they are present by using knowledge about the world.

A Root Cause Analysis Problem $P$ is a 4-Tuple $P =\ < R,\ C,\ S,\ S^+ >$ where

$R = {r_1,\ ...,\ r_n}$ is a finite, non-empty set of root causes.
$S = {s_1,\ ...,\ s_k}$ is a finite, non-empty set of symptoms.
C a subset of $R\ x\ S$ is a relation with domain(C) = R and range(C) = S called Causation.
$S^+$ is a distinguished subset of S said to be Present

Effects and Causes

For any $r_i\ in\ R$ and $s_j\ in\ S\ in\ P =\ < R,\ C,\ S,\ S^+ >$ ,
- $effects(r_i) = {s_j\ | < r_i\ ,\ s_j > in\ C}$ , the set of symptoms that may be caused by $r_i$
- $causes(s_j) = {r_i\ | < r_i\ ,\ s_j > in\ C}$ , the set of root causes that may cause $s_j$
For any subset $R_i\ of\ R$ and subset $S_j\ of\ S$ in $P =\ < R,\ C,\ S,\ S^+ >$ ,
- $effects(R_i) = union\ of\ effects(r_i),\ and\ r_i\ in\ R_i$
- $causes(S_j) = union\ of\ causes(s_j),\ and\ s_j\ in\ S_j$

Closure of Root cause R

Closure of root cause $R$ is $effect(R)$
A Closure of Root Cause $R$ is a unique signature, a vector of probabilities, that uniquely identifies $R$ .

Spurious and Missing Symptom

For any $r_i\ in\ R$ , $s_j\ in\ S$ and any subset $S\ of\ S_k\ in\ P =\ < R,\ C,\ S,\ S^+ >$ ,
- $spurious(r_i\ ,\ S_k) = {s_j\ in\ S_k\ | < r_i\ ,\ s_j > not\ in\ C}$ the set of symptoms in $S_k\ not\ caused\ by\ r_i$
- $missing(r_i\ ,\ S_k) = {s_j\ not\ in\ S_k\ | < r_i\ ,\ s_j > in\ C}$
For any subset $R_i\ of\ R$ and subset $S_k\ of\ S\ in\ P =\ < R,\ C,\ S,\ S^+ >$ ,
- $spurious(R_i\ ,\ S_k) = union\ of\ spurious(r_i\ ,\ S_k),\ and\ r_i\ in\ R_i$
- $missing(R_i\ ,\ S_k) = union\ of\ missing(r_i\ ,\ S_k),\ and\ r_i\ in\ R_i$

Bayesian Network

A Bayesian network is a probabilistic graphical model that represents a set of variables and their conditional dependencies using a directed acyclic graph (DAG), allowing for reasoning under uncertainty by calculating probabilities of events based on known information and relationships between variables; essentially, it's a way to model complex causal relationships between different factors by showing how the probability of one variable changes depending on the states of other related variables.

Causal Model

CDM is driven by Causal Model, "a knowledge base", of root causes.
Causal Model captures the cause and effect association between the root causes and the symptoms.
Causal Model describes how the root causes will propagate across the managed domain and what symptoms may be caused/observed when each of the root causes occur.
CDM is delivered with out-of-the-box built-in Causal Model that captures the common root causes that can occur in cloud native environments.
The Causal Model captures potential root causes in a broad range of entities
- Such as, applications, databases, caches, messaging, load balancers, DNS compute, storage, etc.
The Causal Model is completely independent of any specific managed domain and is applicable to any cloud native application environment. This enables Causely to automatically pinpoint root causes out-of-the-box as soon as it is deployed in an environment.

Topology

Topology is a directed graph where the nodes are entities in the managed domain and the directed edges represents relationship between the entities.
CDM automatically discovers the entities and their relationships in the managed domain.
Entities are applications, services, databases, caches, messaging, load balancers, compute, storage, etc.,
Relationships are Connectivity, Layering, and Composition
- Connectivity: for each entity, the entities it is connected to and the entities it is communicating with.
- Layering: for each entity, the entities it is layered over or underlying.
- Composition: for each entity, the entities it is composed of or part of.
CDM automatically stitches all of these relationships together to generate a dependency map, or Topology Graph, of the entire managed domain.
The Topology Graph is continually updated in real-time to reflect the current state of the managed domain.

Causality Graph (CG)

Causality Graph is a Directed Acyclic Graph (DAG) where the nodes are root causes and symptoms, and the edges represent causality, i.e, $R\ \rightarrow \ S$ means R may cause S.
The edges are labeled with probability $P,\ 0 <$ managed domain.
The Causality Graph represents all the possible $P < 1$ , representing the likelihood R may cause S.
CDM automatically generates the Causality Graph by applying the Topology Graph to the Causal Model.
By applying the Topology of the managed domain to the generic Causal Model, CDM generates the causal knowledge that is specific to the managed domain.
The Causality Graph represents all the possible root causes in the managed domain, all the symptoms that may be observed, and the cause and effect relationships between them.
In a managed domain of a few thousands entities, the Causality Graph will incorporate the knowledge of tens of thousands of potential root causes and hundreds of thousands symptoms - which is well beyond human scale.
The Causality Graph is automatically updated every time the topology changes.
The edge probabilities are learned based on the data in the managed domain.

Codebook

A mapping of all potential root causes ${r_1,\ ...,\ r_n}$ to the symptoms ${s_1,\ ...,\ s_m}$ they may cause.
The Codebook is a causality table where
- The columns $r_1,\ ...,\ r_n$ represent all the potential root causes
- The rows $s_1,\ ...,\ s_m$ represent all the potential symptoms and
- A cell $(r_i\ ,\ s_j)$ represents the probability root causes $r_i$ may cause symptom $s_j$ , that is, the likelihood symptom $s_j$ will be observed/present when root causes $r_i$ occurs.
Each root causes in the Codebook has a unique signature, a vector of m probabilities, that uniquely identifies the root causes.
Using the Codebook, CDM quickly searches and pinpoints the root causes based on the observed symptoms.

Managed Object​

Managed Domain​

Causely Domain Manager (CDM)​

Event​

Active/Present Event​

Root Cause (RC)​

Symptom​

Local Symptom of Root Cause R​

Propagated Symptom of Root Cause R​

Logs Associated with Root Cause​

Root Cause Analysis Problem (RCA)​

Effects and Causes​

Closure of Root cause R​

Spurious and Missing Symptom​

Bayesian Network​

Causal Model​

Topology​

Causality Graph (CG)​

Codebook​