Alert Ingestion and Mapping
Overview
Causely can ingest alerts from your existing monitoring sources and connect them to the causal reasoning engine—without replacing your current alerting setup. When an alert fires, Causely identifies which service or entity it belongs to, maps it to a known symptom, and includes it as evidence in causal analysis.
The following alert sources are supported:
- Alertmanager: Prometheus Alertmanager webhook receiver
- Prometheus: Direct Prometheus integration
- Datadog: Datadog monitors and alerts
- Incident.io: Incident.io alert events
- Dynatrace: Dynatrace problems and alerts
How alerts are mapped automatically
When Causely receives an alert, it attempts to resolve two things: which entity the alert belongs to, and which symptom it represents.
Step 1: Symptom resolution via keyword matching
Causely examines the alert's alertname, title, and description, and matches them against a library of keyword rules. Matching is case-insensitive. The first matching rule wins, so more specific rules (for example, queue-related keywords) are evaluated before broader ones (for example, generic "error").
Here are a few examples of how this works. An alert named HighErrorRate or with a description containing "error rate exceeded" is automatically recognized as a high error rate condition. An alert containing "latency" or "timeout" is recognized as a high latency condition. Similarly, alerts mentioning "kafka lag", "consumer lag", "message wait time", "dead letter", "JVM heap", or "GC time" are each matched to their corresponding condition.
Some examples of what Causely looks for in the alert name, title, or description:
| Condition | Example keywords |
|---|---|
| High error rate | "error", "failed", "failure", "exception", "unavailable", "down" |
| High latency | "latency", "duration", "timeout", "slow", "response time" |
| Kafka consumer lag | "lag" and ("kafka" / "consumer" / "partition") |
| Queue wait time | "message wait", "wait time" and "queue" |
| Dead-letter queue | "dead letter", "deadletter", "dlx" |
| JVM heap pressure | "heap" and ("java" / "jvm") |
| Garbage collection | "gc", "garbage collection", "g1 young", "g1 old" |
| DB connection pool | "db connection", "database connection", "postgres connection" |
| Redis connection pool | "redis connection", "redis connection pool" |
Step 2: Entity resolution via alert labels
A matched symptom still needs an entity to attach to. Causely uses labels in the alert payload to look up the right entity in your topology. Each entity type requires a specific combination of labels that uniquely identifies it—see Required label combinations by entity type below.
If the labels are missing or do not correspond to a known entity in topology, the alert remains unmapped even when the symptom is correctly identified.
What happens after an alert is mapped
Alert maps to a known entity and symptom
The alert is fully ingested. Causely attaches it as an observed symptom to the entity and includes it in causal reasoning. You will see it reflected in the root cause analysis alongside other signals.
Alert maps to a known entity but no symptom match
If Causely identifies which entity the alert belongs to but does not recognize the alert as a known symptom, the alert is recorded but does not contribute to causal analysis. This happens when the alert name, title, and description do not contain any matching keywords from the table above.
Causely engineers are happy to work with you to add important alerts to the knowledge base so they can participate in causal reasoning. Reach out to your Causely team with the alert details.
Alert does not map to any entity
If Causely cannot resolve an entity from the alert, the alert is not attached to the topology. This is almost always caused by missing or incorrect labels in the alert payload. See the next section for the exact labels required for each entity type.
Required label combinations by entity type
For an alert to be attached to an entity, its payload must include the labels that uniquely identify that entity. The exact combination depends on the type of entity the alert is about.
Kubernetes pod
Required: namespace + pod
Use this when the alert is about a specific pod (for example, high JVM heap, GC time, thread contention, connection pool exhaustion).
{
"alerts": [
{
"status": "firing",
"labels": {
"alertname": "HighJVMHeapUsage",
"namespace": "prod",
"pod": "payment-service-7d4f8b-xyz",
"severity": "warning"
},
"annotations": {
"summary": "JVM heap utilization above 80%"
},
"startsAt": "2025-02-04T12:00:00.000Z",
"endsAt": "0001-01-01T00:00:00Z"
}
]
}
Kubernetes container
Required: namespace + pod + container
Use this when the alert targets a specific container within a pod (for example, a sidecar or init container).
{
"alerts": [
{
"status": "firing",
"labels": {
"alertname": "ContainerHighCPU",
"namespace": "prod",
"pod": "payment-service-7d4f8b-xyz",
"container": "payment-app",
"severity": "warning"
},
"annotations": {
"summary": "Container CPU usage above threshold"
},
"startsAt": "2025-02-04T12:00:00.000Z",
"endsAt": "0001-01-01T00:00:00Z"
}
]
}
Kubernetes service
Required: namespace + pod
Causely resolves the owning service from the pod via topology. The optional service label can be included as a hint but is not required for resolution.
{
"alerts": [
{
"status": "firing",
"labels": {
"alertname": "HighErrorRate",
"namespace": "prod",
"pod": "checkout-service-abc",
"service": "checkout-service",
"severity": "warning"
},
"annotations": {
"summary": "High error rate on checkout-service",
"description": "Error rate increased above threshold in the last 5m"
},
"startsAt": "2025-02-04T12:00:00.000Z",
"endsAt": "0001-01-01T00:00:00Z"
}
]
}
HTTP path
Required: namespace + pod + a path label
The path label can be any of: uri, url_path, http_path, path, http.route. The path entity (for example, /checkout) must already exist in topology (for example, from ingress or service metrics).
{
"alerts": [
{
"status": "firing",
"labels": {
"alertname": "HighPathErrorRate",
"namespace": "ingress-nginx",
"pod": "ingress-nginx-controller-956c769c7-lgcxx",
"uri": "/checkout",
"severity": "warning"
},
"annotations": {
"summary": "High error rate on /checkout"
},
"startsAt": "2025-02-04T12:00:00.000Z",
"endsAt": "0001-01-01T00:00:00Z"
}
]
}
gRPC / RPC method
Required: rpc_service + rpc_method (or grpc_service + grpc_method)
Optional: namespace, pod. The RPC method entity (service + method pair) must already exist in topology (for example, from distributed traces).
{
"alerts": [
{
"status": "firing",
"labels": {
"alertname": "HighGRPCErrorRate",
"rpc_service": "user.UserService",
"rpc_method": "GetUser",
"namespace": "prod",
"pod": "user-service-abc",
"severity": "warning"
},
"annotations": {
"summary": "High error rate on user.UserService/GetUser"
},
"startsAt": "2025-02-04T12:00:00.000Z",
"endsAt": "0001-01-01T00:00:00Z"
}
]
}
Kafka topic
Required: topic + consumer group label
The consumer group can appear under any of these label names: consumer_group_id, consumer_group, group, or consumergroup. Optionally include namespace and pod when using Micrometer-style instrumentation.
Confluent-style (topic + consumer group ID only):
{
"alerts": [
{
"status": "firing",
"labels": {
"alertname": "HighKafkaConsumerLag",
"topic": "orders",
"consumer_group_id": "order-processor-group",
"severity": "warning"
},
"annotations": {
"summary": "Kafka consumer lag above threshold for topic orders"
},
"startsAt": "2025-02-04T12:00:00.000Z",
"endsAt": "0001-01-01T00:00:00Z"
}
]
}
Micrometer-style (topic + group + namespace + pod):
{
"alerts": [
{
"status": "firing",
"labels": {
"alertname": "HighKafkaConsumerLag",
"topic": "orders",
"group": "order-processor-group",
"namespace": "default",
"pod": "order-processor-0",
"severity": "warning"
},
"annotations": {
"summary": "Kafka consumer lag above threshold for topic orders"
},
"startsAt": "2025-02-04T12:00:00.000Z",
"endsAt": "0001-01-01T00:00:00Z"
}
]
}
Queue (RabbitMQ or similar)
Required: a label carrying the queue name (for example, queue)
The queue must already exist in topology (for example, discovered from RabbitMQ or OpenTelemetry).
{
"alerts": [
{
"status": "firing",
"labels": {
"alertname": "MessageWaitTimeHigh",
"queue": "work_queue",
"severity": "warning"
},
"annotations": {
"summary": "Message wait time above threshold on work_queue"
},
"startsAt": "2025-02-04T12:00:00.000Z",
"endsAt": "0001-01-01T00:00:00Z"
}
]
}
Summary
| Entity type | Required label combination |
|---|---|
| Kubernetes pod | namespace + pod |
| Kubernetes container | namespace + pod + container |
| Kubernetes service | namespace + pod (service resolved from pod via topology) |
| HTTP path | namespace + pod + path label (uri, url_path, http_path, path, or http.route) |
| gRPC / RPC method | rpc_service + rpc_method (or grpc_service + grpc_method) |
| Kafka topic | topic + consumer group (consumer_group_id, group, consumer_group, or consumergroup) |
| Queue | queue name label (for example queue) |