Thresholds for Service Symptoms
Causely automatically learns and detects service symptoms based on various metrics. However, you may want to customize these thresholds to better match your specific requirements and SLO definitions. This document explains how to configure custom thresholds for your services.
Overview
Thresholds in Causely are used to:
- Define custom boundaries for service symptoms
- Specify SLO violation criteria
- Override automatically learned thresholds
Supported Thresholds
Currently, you can configure the following thresholds:
- Error Rate Threshold: Defines the maximum acceptable error rate for a service
- Latency Threshold: Defines the maximum acceptable latency for a service (in milliseconds)
Configuration Methods
Using Kubernetes Labels
The recommended way to configure thresholds is using Kubernetes labels. You can apply these labels to your services:
# Configure error rate threshold (for example, 1% error rate)
kubectl label svc -n <namespace> <service-name> "causely.ai/error-rate-threshold=0.01"
# Configure latency threshold (for example, 500ms)
kubectl label svc -n <namespace> <service-name> "causely.ai/latency-threshold=500.0"
Using Nomad Service Tags
For Nomad services, you can configure thresholds using service tags in your job specification:
job "example" {
group "app" {
service {
name = "my-service"
port = 8080
tags = [
"causely.ai/error-rate-threshold=0.01"
"causely.ai/latency-threshold=500.0"
]
}
}
}
Using Consul Service Metadata
For Consul services, you can configure thresholds using service metadata:
# Register a service with threshold metadata
consul services register \
-name="my-service" \
-port=8080 \
-meta="causely.ai/error-rate-threshold=0.01" \
-meta="causely.ai/latency-threshold=500.0"
# Update existing service metadata
consul services register \
-id="my-service-id" \
-name="my-service" \
-port=8080 \
-meta="causely.ai/error-rate-threshold=0.01" \
-meta="causely.ai/latency-activation-delay=500.0"
Threshold Values
- Error Rate Threshold: Expressed as a decimal (for example, 0.01 for 1%)
- Latency Threshold: Expressed in milliseconds (for example, 500.0 for 500 ms)
Best Practices
- Start with Defaults: Begin with Causely's automatically learned thresholds
- Adjust Based on SLOs: Modify thresholds to match your specific SLO requirements
- Monitor Impact: After changing thresholds, monitor how they affect symptom detection
- Document Changes: Keep track of threshold changes and their rationale
Example Use Cases
- Strict SLO Requirements: Set lower thresholds for critical services
- Service-Specific Requirements: Configure different thresholds for different services
- Temporary Adjustments: Modify thresholds during maintenance windows
Suggest Additional Thresholds
We're currently exploring ways to make threshold configuration more flexible via the Causely UI and support for additional metrics such as CPU and memory. If there are other thresholds you'd like to configure, please let us know.