Terminology
This page contains the terminology used in Causely. The list is not exhaustive, but rather a guide to help you understand the key concepts. Terms are not listed in alphabetical order, but rather grouped by their relatedness.
Managed Object
An object in the managed environment.
Managed Domain
A scoped collection of managed objects.
Causely Domain Manager (CDM)
A Causely instance that manages a managed domain.
Event
- An observable anomaly in a managed object.
- An event is
TRUE
if the anomaly is observed/present/active. - Examples
- HighCPUUtilization
- CPU Utilization > 90%
- Container OOM Killed
- An external notification
- HighCPUUtilization
Active/Present Event
An event that is TRUE
.
Root Cause (RC)
- Root Cause is something that may occur in a managed object.
- Root Cause is inferred based on observed events.
- Examples
- Database Congested
- Application Malfunction
- Container CPU Congested
- etc.
Symptom
- An event that may be caused by a root cause.
- Examples
- Container HighCPUUtilization may be caused by Container Congested
- Service HighLatency may be caused by Database Congested
- Service HighErrorRate may be caused by Application Malfunction
- etc.
Local Symptom of Root Cause R
- A symptom that is observable in the managed object in which the root cause occurs.
- Examples
- A Container X HighCPUUtilization is a local symptom of Container X Congested
- etc.
Propagated Symptom of Root Cause R
- A symptom that may be caused by root cause observed in a managed object related to the managed object in which root cause occurs.
- Examples
- Service Starvation is a propagated symptom of Database Congested propagating to the services that are accessing the database
Logs Associated with Root Cause
Causely captures logs from the containers that underlie your services to provide rich context for understanding issues. Each container (within its pod) writes logs—including errors and exceptions—to stdout. When Causely detects a service malfunction or identifies a root cause, it automatically collects and surfaces relevant log lines from the underlying container.
These logs are shown in two places:
- Under the affected service, when the service is exhibiting abnormal behavior (for example, error spikes, degraded performance).
- Alongside a root cause, to highlight the precise errors or stack traces that occurred around the time of failure.
This helps validate the issue and dramatically shortens time to understanding and resolution.
Root Cause Analysis Problem (RCA)
Given a set of Symptoms (manifestations) identify the explanation of why they are present by using knowledge about the world.
A Root Cause Analysis Problem is a 4-Tuple where
- is a finite, non-empty set of root causes.
- is a finite, non-empty set of symptoms.
- C a subset of is a relation with domain(C) = R and range(C) = S called Causation.
- is a distinguished subset of S said to be Present
Effects and Causes
-
For any and ,
- , the set of symptoms that may be caused by
- , the set of root causes that may cause
-
For any subset and subset in ,
Closure of Root cause R
- Closure of root cause is
- A Closure of Root Cause is a unique signature, a vector of probabilities, that uniquely identifies .
Spurious and Missing Symptom
-
For any , and any subset ,
- the set of symptoms in
-
For any subset and subset ,
Bayesian Network
A Bayesian network is a probabilistic graphical model that represents a set of variables and their conditional dependencies using a directed acyclic graph (DAG), allowing for reasoning under uncertainty by calculating probabilities of events based on known information and relationships between variables; essentially, it's a way to model complex causal relationships between different factors by showing how the probability of one variable changes depending on the states of other related variables.
Causal Model
- CDM is driven by Causal Model, "a knowledge base", of root causes.
- Causal Model captures the cause and effect association between the root causes and the symptoms.
- Causal Model describes how the root causes will propagate across the managed domain and what symptoms may be caused/observed when each of the root causes occur.
- CDM is delivered with out-of-the-box built-in Causal Model that captures the common root causes that can occur in cloud native environments.
- The Causal Model captures potential root causes in a broad range of entities
- Such as, applications, databases, caches, messaging, load balancers, DNS compute, storage, etc.
- The Causal Model is completely independent of any specific managed domain and is applicable to any cloud native application environment. This enables Causely to automatically pinpoint root causes out-of-the-box as soon as it is deployed in an environment.
Topology
- Topology is a directed graph where the nodes are entities in the managed domain and the directed edges represents relationship between the entities.
- CDM automatically discovers the entities and their relationships in the managed domain.
- Entities are applications, services, databases, caches, messaging, load balancers, compute, storage, etc.,
- Relationships are Connectivity, Layering, and Composition
- Connectivity: for each entity, the entities it is connected to and the entities it is communicating with.
- Layering: for each entity, the entities it is layered over or underlying.
- Composition: for each entity, the entities it is composed of or part of.
- CDM automatically stitches all of these relationships together to generate a dependency map, or Topology Graph, of the entire managed domain.
- The Topology Graph is continually updated in real-time to reflect the current state of the managed domain.
Causality Graph (CG)
- Causality Graph is a Directed Acyclic Graph (DAG) where the nodes are root causes and symptoms, and the edges represent causality, i.e, means R may cause S.
- The edges are labeled with probability managed domain.
- The Causality Graph represents all the possible , representing the likelihood R may cause S.
- CDM automatically generates the Causality Graph by applying the Topology Graph to the Causal Model.
- By applying the Topology of the managed domain to the generic Causal Model, CDM generates the causal knowledge that is specific to the managed domain.
- The Causality Graph represents all the possible root causes in the managed domain, all the symptoms that may be observed, and the cause and effect relationships between them.
- In a managed domain of a few thousands entities, the Causality Graph will incorporate the knowledge of tens of thousands of potential root causes and hundreds of thousands symptoms - which is well beyond human scale.
- The Causality Graph is automatically updated every time the topology changes.
- The edge probabilities are learned based on the data in the managed domain.
Codebook
- A mapping of all potential root causes to the symptoms they may cause.
- The Codebook is a causality table where
- The columns represent all the potential root causes
- The rows represent all the potential symptoms and
- A cell represents the probability root causes may cause symptom , that is, the likelihood symptom will be observed/present when root causes occurs.
- Each root causes in the Codebook has a unique signature, a vector of m probabilities, that uniquely identifies the root causes.
- Using the Codebook, CDM quickly searches and pinpoints the root causes based on the observed symptoms.