Alert Ingestion and Mapping

Overview

Causely can ingest alerts from your existing monitoring sources and connect them to the causal reasoning engine—without replacing your current alerting setup. When an alert fires, Causely identifies which service or entity it belongs to, maps it to a known symptom, and includes it as evidence in causal analysis.

The following alert sources are supported:

Alertmanager: Prometheus Alertmanager webhook receiver
Prometheus: Direct Prometheus integration
Datadog: Datadog monitors and alerts
Incident.io: Incident.io alert events
Dynatrace: Dynatrace problems and alerts

How alerts are mapped automatically

When Causely receives an alert, it attempts to resolve two things: which entity the alert belongs to, and which symptom it represents.

Step 1: Symptom resolution via keyword matching

Causely examines the alert's alertname, title, and description, and matches them against a library of keyword rules. Matching is case-insensitive. The first matching rule wins, so more specific rules (for example, queue-related keywords) are evaluated before broader ones (for example, generic "error").

Here are a few examples of how this works. An alert named HighErrorRate or with a description containing "error rate exceeded" is automatically recognized as a high error rate condition. An alert containing "latency" or "timeout" is recognized as a high latency condition. Similarly, alerts mentioning "kafka lag", "consumer lag", "message wait time", "dead letter", "JVM heap", or "GC time" are each matched to their corresponding condition.

Some examples of what Causely looks for in the alert name, title, or description:

Condition	Example keywords
High error rate	"error", "failed", "failure", "exception", "unavailable", "down"
High latency	"latency", "duration", "timeout", "slow", "response time"
Kafka consumer lag	"lag" and ("kafka" / "consumer" / "partition")
Queue wait time	"message wait", "wait time" and "queue"
Dead-letter queue	"dead letter", "deadletter", "dlx"
JVM heap pressure	"heap" and ("java" / "jvm")
Garbage collection	"gc", "garbage collection", "g1 young", "g1 old"
DB connection pool	"db connection", "database connection", "postgres connection"
Redis connection pool	"redis connection", "redis connection pool"

Step 2: Entity resolution via alert labels

A matched symptom still needs an entity to attach to. Causely uses labels in the alert payload to look up the right entity in your topology. Each entity type requires a specific combination of labels that uniquely identifies it—see Required label combinations by entity type below.

If the labels are missing or do not correspond to a known entity in topology, the alert remains unmapped even when the symptom is correctly identified.

What happens after an alert is mapped

Alert maps to a known entity and symptom

The alert is fully ingested. Causely attaches it as an observed symptom to the entity and includes it in causal reasoning. You will see it reflected in the root cause analysis alongside other signals.

Alert maps to a known entity but no symptom match

If Causely identifies which entity the alert belongs to but does not recognize the alert as a known symptom, the alert is recorded but does not contribute to causal analysis. This happens when the alert name, title, and description do not contain any matching keywords from the table above.

Causely engineers are happy to work with you to add important alerts to the knowledge base so they can participate in causal reasoning. Reach out to your Causely team with the alert details.

Alert does not map to any entity

If Causely cannot resolve an entity from the alert, the alert is not attached to the topology. This is almost always caused by missing or incorrect labels in the alert payload. See the next section for the exact labels required for each entity type.

Required label combinations by entity type

For an alert to be attached to an entity, its payload must include the labels that uniquely identify that entity. The exact combination depends on the type of entity the alert is about.

Kubernetes pod

Required: namespace + pod

Use this when the alert is about a specific pod (for example, high JVM heap, GC time, thread contention, connection pool exhaustion).

{
  "alerts": [
    {
      "status": "firing",
      "labels": {
        "alertname": "HighJVMHeapUsage",
        "namespace": "prod",
        "pod": "payment-service-7d4f8b-xyz",
        "severity": "warning"
      },
      "annotations": {
        "summary": "JVM heap utilization above 80%"
      },
      "startsAt": "2025-02-04T12:00:00.000Z",
      "endsAt": "0001-01-01T00:00:00Z"
    }
  ]
}

Kubernetes container

Required: namespace + pod + container

Use this when the alert targets a specific container within a pod (for example, a sidecar or init container).

{
  "alerts": [
    {
      "status": "firing",
      "labels": {
        "alertname": "ContainerHighCPU",
        "namespace": "prod",
        "pod": "payment-service-7d4f8b-xyz",
        "container": "payment-app",
        "severity": "warning"
      },
      "annotations": {
        "summary": "Container CPU usage above threshold"
      },
      "startsAt": "2025-02-04T12:00:00.000Z",
      "endsAt": "0001-01-01T00:00:00Z"
    }
  ]
}

Kubernetes service

Required: namespace + pod

Causely resolves the owning service from the pod via topology. The optional service label can be included as a hint but is not required for resolution.

{
  "alerts": [
    {
      "status": "firing",
      "labels": {
        "alertname": "HighErrorRate",
        "namespace": "prod",
        "pod": "checkout-service-abc",
        "service": "checkout-service",
        "severity": "warning"
      },
      "annotations": {
        "summary": "High error rate on checkout-service",
        "description": "Error rate increased above threshold in the last 5m"
      },
      "startsAt": "2025-02-04T12:00:00.000Z",
      "endsAt": "0001-01-01T00:00:00Z"
    }
  ]
}

HTTP path

Required: namespace + pod + a path label

The path label can be any of: uri, url_path, http_path, path, http.route. The path entity (for example, /checkout) must already exist in topology (for example, from ingress or service metrics).

{
  "alerts": [
    {
      "status": "firing",
      "labels": {
        "alertname": "HighPathErrorRate",
        "namespace": "ingress-nginx",
        "pod": "ingress-nginx-controller-956c769c7-lgcxx",
        "uri": "/checkout",
        "severity": "warning"
      },
      "annotations": {
        "summary": "High error rate on /checkout"
      },
      "startsAt": "2025-02-04T12:00:00.000Z",
      "endsAt": "0001-01-01T00:00:00Z"
    }
  ]
}

gRPC / RPC method

Required: rpc_service + rpc_method (or grpc_service + grpc_method)

Optional: namespace, pod. The RPC method entity (service + method pair) must already exist in topology (for example, from distributed traces).

{
  "alerts": [
    {
      "status": "firing",
      "labels": {
        "alertname": "HighGRPCErrorRate",
        "rpc_service": "user.UserService",
        "rpc_method": "GetUser",
        "namespace": "prod",
        "pod": "user-service-abc",
        "severity": "warning"
      },
      "annotations": {
        "summary": "High error rate on user.UserService/GetUser"
      },
      "startsAt": "2025-02-04T12:00:00.000Z",
      "endsAt": "0001-01-01T00:00:00Z"
    }
  ]
}

Kafka topic

Required: topic + consumer group label

The consumer group can appear under any of these label names: consumer_group_id, consumer_group, group, or consumergroup. Optionally include namespace and pod when using Micrometer-style instrumentation.

Confluent-style (topic + consumer group ID only):

{
  "alerts": [
    {
      "status": "firing",
      "labels": {
        "alertname": "HighKafkaConsumerLag",
        "topic": "orders",
        "consumer_group_id": "order-processor-group",
        "severity": "warning"
      },
      "annotations": {
        "summary": "Kafka consumer lag above threshold for topic orders"
      },
      "startsAt": "2025-02-04T12:00:00.000Z",
      "endsAt": "0001-01-01T00:00:00Z"
    }
  ]
}

Micrometer-style (topic + group + namespace + pod):

{
  "alerts": [
    {
      "status": "firing",
      "labels": {
        "alertname": "HighKafkaConsumerLag",
        "topic": "orders",
        "group": "order-processor-group",
        "namespace": "default",
        "pod": "order-processor-0",
        "severity": "warning"
      },
      "annotations": {
        "summary": "Kafka consumer lag above threshold for topic orders"
      },
      "startsAt": "2025-02-04T12:00:00.000Z",
      "endsAt": "0001-01-01T00:00:00Z"
    }
  ]
}

Queue (RabbitMQ or similar)

Required: a label carrying the queue name (for example, queue)

The queue must already exist in topology (for example, discovered from RabbitMQ or OpenTelemetry).

{
  "alerts": [
    {
      "status": "firing",
      "labels": {
        "alertname": "MessageWaitTimeHigh",
        "queue": "work_queue",
        "severity": "warning"
      },
      "annotations": {
        "summary": "Message wait time above threshold on work_queue"
      },
      "startsAt": "2025-02-04T12:00:00.000Z",
      "endsAt": "0001-01-01T00:00:00Z"
    }
  ]
}

Summary

Entity type	Required label combination
Kubernetes pod	`namespace` + `pod`
Kubernetes container	`namespace` + `pod` + `container`
Kubernetes service	`namespace` + `pod` (service resolved from pod via topology)
HTTP path	`namespace` + `pod` + path label (`uri`, `url_path`, `http_path`, `path`, or `http.route`)
gRPC / RPC method	`rpc_service` + `rpc_method` (or `grpc_service` + `grpc_method`)
Kafka topic	`topic` + consumer group (`consumer_group_id`, `group`, `consumer_group`, or `consumergroup`)
Queue	queue name label (for example `queue`)

Overview​

How alerts are mapped automatically​

Step 1: Symptom resolution via keyword matching​

Step 2: Entity resolution via alert labels​

What happens after an alert is mapped​

Alert maps to a known entity and symptom​

Alert maps to a known entity but no symptom match​

Alert does not map to any entity​

Required label combinations by entity type​

Kubernetes pod​

Kubernetes container​

Kubernetes service​

HTTP path​

gRPC / RPC method​

Kafka topic​

Queue (RabbitMQ or similar)​

Summary​