Symptom Delay Configuration
Causely automatically detects service symptoms based on various metrics. To avoid reacting to every temporary spike or drop in raw values, you can configure activation and deactivation delays. This document explains how to configure symptom delays for your services.
Overview
Symptom delays in Causely are used to:
- Prevent false positives from temporary metric fluctuations
- Provide more stable symptom detection
- Avoid reacting to every spike or drop in raw values
Supported Delay Types
Currently, you can configure the following delay types for service symptoms:
- Error Rate Activation Delay: Defines how long to wait before activating an error rate symptom
- Latency Activation Delay: Defines how long to wait before activating a latency symptom
Configuration Methods
Using Kubernetes Labels
The recommended way to configure symptom delays is using Kubernetes labels. You can apply these labels to your services:
# Configure error rate activation delay (for example, 5 minutes)
kubectl label svc -n <namespace> <service-name> "causely.ai/error-rate-activation-delay=5"
# Configure latency activation delay (for example, 3 minutes)
kubectl label svc -n <namespace> <service-name> "causely.ai/latency-activation-delay=3"
Using Nomad Service Tags
For Nomad services, you can configure symptom delays using service tags in your job specification:
job "example" {
group "app" {
service {
name = "my-service"
port = 8080
tags = [
"causely.ai/error-rate-activation-delay=5",
"causely.ai/latency-activation-delay=3"
]
}
}
}
Using Consul Service Metadata
For Consul services, you can configure symptom delays using service metadata:
# Register a service with symptom delay metadata
consul services register \
-name="my-service" \
-port=8080 \
-meta="causely.ai/error-rate-activation-delay=5" \
-meta="causely.ai/latency-activation-delay=3"
# Update existing service metadata
consul services register \
-id="my-service-id" \
-name="my-service" \
-port=8080 \
-meta="causely.ai/error-rate-activation-delay=5" \
-meta="causely.ai/latency-activation-delay=3"
Delay Values
- Activation Delays: Expressed in minutes (for example, 5 for 5 minutes)
- Minimum Value: 1 minute
- Recommended Range: 1-10 minutes for most use cases
Default Behavior
If you don't configure any symptom delays, Causely uses the following default values:
- Activation Delay: 10 minutes
- Deactivation Delay: 5 minutes
This means that by default:
- A symptom will only activate after the threshold has been exceeded for 10 consecutive minutes
- A symptom will deactivate after the metrics return to normal levels and remain continuously below the threshold for 5 minutes
How It Works
When you configure symptom delays:
- Activation: Causely waits for the specified delay period before activating a symptom, even if the threshold is exceeded
- Deactivation: Similarly, Causely waits for the delay period before deactivating a symptom when metrics return to normal
Best Practices
- Start with Defaults: Begin with Causely's default behavior (10-minute activation, 5-minute deactivation)
- Adjust Based on Service Characteristics:
- Use shorter delays (1-3 minutes) for critical services that need quick response
- Use longer delays (5-10 minutes) for services with frequent but harmless spikes
- Monitor Impact: After changing delays, monitor how they affect symptom detection accuracy
- Consider Service Patterns: Account for your service's typical behavior patterns when setting delays
- Document Changes: Keep track of delay changes and their rationale
Example Use Cases
- Noisy Services: Increase delays for services that frequently have temporary spikes
- Critical Services: Use shorter delays for services where quick detection is essential
- Batch Processing: Configure longer delays for services that handle batch operations with expected temporary load increases
- Development Environments: Use longer delays in non-production environments to reduce noise
Configuration Examples
High-Priority Service (Quick Response)
Kubernetes:
# Quick detection for critical services
kubectl label svc -n production payment-service "causely.ai/error-rate-activation-delay=1"
kubectl label svc -n production payment-service "causely.ai/latency-activation-delay=1"
Nomad:
job "payment-service" {
group "app" {
service {
name = "payment-service"
port = 8080
tags = [
"causely.ai/error-rate-activation-delay=1",
"causely.ai/latency-activation-delay=1"
]
}
}
}
Consul:
consul services register \
-name="payment-service" \
-port=8080 \
-meta="causely.ai/error-rate-activation-delay=1" \
-meta="causely.ai/latency-activation-delay=1"
Batch Processing Service (Stable Detection)
Kubernetes:
# Longer delays for batch processing services
kubectl label svc -n production data-processor "causely.ai/error-rate-activation-delay=5"
kubectl label svc -n production data-processor "causely.ai/latency-activation-delay=5"
Nomad:
job "data-processor" {
group "batch" {
service {
name = "data-processor"
port = 9090
tags = [
"causely.ai/error-rate-activation-delay=5",
"causely.ai/latency-activation-delay=5"
]
}
}
}
Consul:
consul services register \
-name="data-processor" \
-port=9090 \
-meta="causely.ai/error-rate-activation-delay=5" \
-meta="causely.ai/latency-activation-delay=5"
Development Environment (Reduced Noise)
Kubernetes:
# Longer delays to reduce noise in development
kubectl label svc -n dev api-service "causely.ai/error-rate-activation-delay=10"
kubectl label svc -n dev api-service "causely.ai/latency-activation-delay=10"
Nomad:
job "api-service" {
group "dev" {
service {
name = "api-service"
port = 3000
tags = [
"causely.ai/error-rate-activation-delay=10",
"causely.ai/latency-activation-delay=10"
]
}
}
}
Consul:
consul services register \
-name="api-service" \
-port=3000 \
-meta="causely.ai/error-rate-activation-delay=10" \
-meta="causely.ai/latency-activation-delay=10"