Skip to main content

Terminology

This page contains the terminology used in Causely. The list is not exhaustive, but rather a guide to help you understand the key concepts. Terms are not listed in alphabetical order, but rather grouped by their relatedness.

Managed Object

An object in the managed environment.

Managed Domain

A scoped collection of managed objects.

Causely Domain Manager (CDM)

A Causely instance that manages a managed domain.

Event

  • An observable anomaly in a managed object.
  • An event is TRUE if the anomaly is observed/present/active.
  • Examples
    • HighCPUUtilization
      • CPU Utilization > 90%
    • Container OOM Killed
      • An external notification

Active/Present Event

An event that is TRUE.

Root Cause (RC)

  • Root Cause is something that may occur in a managed object.
  • Root Cause is inferred based on observed events.
  • Examples
    • Database Congested
    • Application Malfunction
    • Container CPU Congested
    • etc.

Symptom

  • An event that may be caused by a root cause.
  • Examples
    • Container HighCPUUtilization may be caused by Container Congested
    • Service HighLatency may be caused by Database Congested
    • Service HighErrorRate may be caused by Application Malfunction
    • etc.

Local Symptom of Root Cause R

  • A symptom that is observable in the managed object in which the root cause RR occurs.
  • Examples
    • A Container X HighCPUUtilization is a local symptom of Container X Congested
    • etc.

Propagated Symptom of Root Cause R

  • A symptom that may be caused by root cause RR observed in a managed object related to the managed object in which root cause RR occurs.
  • Examples
    • Service Starvation is a propagated symptom of Database Congested propagating to the services that are accessing the database

Logs Associated with Root Cause

Causely captures logs from the containers that underlie your services to provide rich context for understanding issues. Each container (within its pod) writes logs—including errors and exceptions—to stdout. When Causely detects a service malfunction or identifies a root cause, it automatically collects and surfaces relevant log lines from the underlying container.

These logs are shown in two places:

  • Under the affected service, when the service is exhibiting abnormal behavior (for example, error spikes, degraded performance).
  • Alongside a root cause, to highlight the precise errors or stack traces that occurred around the time of failure.

This helps validate the issue and dramatically shortens time to understanding and resolution.

Root Cause Analysis Problem (RCA)

Given a set of Symptoms (manifestations) identify the explanation of why they are present by using knowledge about the world.

A Root Cause Analysis Problem PP is a 4-Tuple P= <R, C, S, S+>P =\ < R,\ C,\ S,\ S^+ > where

  • R=r1, ..., rnR = {r_1,\ ...,\ r_n} is a finite, non-empty set of root causes.
  • S=s1, ..., skS = {s_1,\ ...,\ s_k} is a finite, non-empty set of symptoms.
  • C a subset of R x SR\ x\ S is a relation with domain(C) = R and range(C) = S called Causation.
  • S+S^+ is a distinguished subset of S said to be Present

Effects and Causes

  • For any ri in Rr_i\ in\ R and sj in S in P= <R, C, S, S+>s_j\ in\ S\ in\ P =\ < R,\ C,\ S,\ S^+ >,

    • effects(ri)=sj <ri , sj>in Ceffects(r_i) = {s_j\ | < r_i\ ,\ s_j > in\ C}, the set of symptoms that may be caused by rir_i
    • causes(sj)=ri <ri , sj>in Ccauses(s_j) = {r_i\ | < r_i\ ,\ s_j > in\ C}, the set of root causes that may cause sjs_j
  • For any subset Ri of RR_i\ of\ R and subset Sj of SS_j\ of\ S in P= <R, C, S, S+>P =\ < R,\ C,\ S,\ S^+ >,

    • effects(Ri)=union of effects(ri), and ri in Rieffects(R_i) = union\ of\ effects(r_i),\ and\ r_i\ in\ R_i
    • causes(Sj)=union of causes(sj), and sj in Sjcauses(S_j) = union\ of\ causes(s_j),\ and\ s_j\ in\ S_j

Closure of Root cause R

  • Closure of root cause RR is effect(R)effect(R)
  • A Closure of Root Cause RR is a unique signature, a vector of probabilities, that uniquely identifies RR.

Spurious and Missing Symptom

  • For any ri in Rr_i\ in\ R, sj in Ss_j\ in\ S and any subset S of Sk in P= <R, C, S, S+>S\ of\ S_k\ in\ P =\ < R,\ C,\ S,\ S^+ >,

    • spurious(ri , Sk)=sj in Sk <ri , sj>not in Cspurious(r_i\ ,\ S_k) = {s_j\ in\ S_k\ | < r_i\ ,\ s_j > not\ in\ C} the set of symptoms in Sk not caused by riS_k\ not\ caused\ by\ r_i
    • missing(ri , Sk)=sj not in Sk <ri , sj>in Cmissing(r_i\ ,\ S_k) = {s_j\ not\ in\ S_k\ | < r_i\ ,\ s_j > in\ C}
  • For any subset Ri of RR_i\ of\ R and subset Sk of S in P= <R, C, S, S+>S_k\ of\ S\ in\ P =\ < R,\ C,\ S,\ S^+ >,

    • spurious(Ri , Sk)=union of spurious(ri , Sk), and ri in Rispurious(R_i\ ,\ S_k) = union\ of\ spurious(r_i\ ,\ S_k),\ and\ r_i\ in\ R_i
    • missing(Ri , Sk)=union of missing(ri , Sk), and ri in Rimissing(R_i\ ,\ S_k) = union\ of\ missing(r_i\ ,\ S_k),\ and\ r_i\ in\ R_i

Bayesian Network

A Bayesian network is a probabilistic graphical model that represents a set of variables and their conditional dependencies using a directed acyclic graph (DAG), allowing for reasoning under uncertainty by calculating probabilities of events based on known information and relationships between variables; essentially, it's a way to model complex causal relationships between different factors by showing how the probability of one variable changes depending on the states of other related variables.

Causal Model

  • CDM is driven by Causal Model, "a knowledge base", of root causes.
  • Causal Model captures the cause and effect association between the root causes and the symptoms.
  • Causal Model describes how the root causes will propagate across the managed domain and what symptoms may be caused/observed when each of the root causes occur.
  • CDM is delivered with out-of-the-box built-in Causal Model that captures the common root causes that can occur in cloud native environments.
  • The Causal Model captures potential root causes in a broad range of entities
    • Such as, applications, databases, caches, messaging, load balancers, DNS compute, storage, etc.
  • The Causal Model is completely independent of any specific managed domain and is applicable to any cloud native application environment. This enables Causely to automatically pinpoint root causes out-of-the-box as soon as it is deployed in an environment.

Topology

  • Topology is a directed graph where the nodes are entities in the managed domain and the directed edges represents relationship between the entities.
  • CDM automatically discovers the entities and their relationships in the managed domain.
  • Entities are applications, services, databases, caches, messaging, load balancers, compute, storage, etc.,
  • Relationships are Connectivity, Layering, and Composition
    • Connectivity: for each entity, the entities it is connected to and the entities it is communicating with.
    • Layering: for each entity, the entities it is layered over or underlying.
    • Composition: for each entity, the entities it is composed of or part of.
  • CDM automatically stitches all of these relationships together to generate a dependency map, or Topology Graph, of the entire managed domain.
  • The Topology Graph is continually updated in real-time to reflect the current state of the managed domain.

Causality Graph (CG)

  • Causality Graph is a Directed Acyclic Graph (DAG) where the nodes are root causes and symptoms, and the edges represent causality, i.e, R  SR\ \rightarrow \ S means R may cause S.
  • The edges are labeled with probability P, 0<P,\ 0 < managed domain.
  • The Causality Graph represents all the possible P<1P < 1, representing the likelihood R may cause S.
  • CDM automatically generates the Causality Graph by applying the Topology Graph to the Causal Model.
  • By applying the Topology of the managed domain to the generic Causal Model, CDM generates the causal knowledge that is specific to the managed domain.
  • The Causality Graph represents all the possible root causes in the managed domain, all the symptoms that may be observed, and the cause and effect relationships between them.
  • In a managed domain of a few thousands entities, the Causality Graph will incorporate the knowledge of tens of thousands of potential root causes and hundreds of thousands symptoms - which is well beyond human scale.
  • The Causality Graph is automatically updated every time the topology changes.
  • The edge probabilities are learned based on the data in the managed domain.

Codebook

  • A mapping of all potential root causes r1, ..., rn{r_1,\ ...,\ r_n} to the symptoms s1, ..., sm{s_1,\ ...,\ s_m} they may cause.
  • The Codebook is a causality table where
    • The columns r1, ..., rnr_1,\ ...,\ r_n represent all the potential root causes
    • The rows s1, ..., sms_1,\ ...,\ s_m represent all the potential symptoms and
    • A cell (ri , sj)(r_i\ ,\ s_j) represents the probability root causes rir_i may cause symptom sjs_j , that is, the likelihood symptom sjs_j will be observed/present when root causes rir_i occurs.
  • Each root causes in the Codebook has a unique signature, a vector of m probabilities, that uniquely identifies the root causes.
  • Using the Codebook, CDM quickly searches and pinpoints the root causes based on the observed symptoms.