Skip to main content

Thresholds for Service Symptoms

Causely automatically learns and detects service symptoms based on various metrics. However, you may want to customize these thresholds to better match your specific requirements and SLO definitions. This document explains how to configure custom thresholds for your services.

Overview

Thresholds in Causely are used to:

  • Define custom boundaries for service symptoms
  • Specify SLO violation criteria
  • Override automatically learned thresholds

Supported Thresholds

Currently, you can configure the following thresholds:

  • Error Rate Threshold: Defines the maximum acceptable error rate for a service
  • Latency Threshold: Defines the maximum acceptable latency for a service (in milliseconds)

Configuration Methods

Using Kubernetes Labels

The recommended way to configure thresholds is using Kubernetes labels. You can apply these labels to your services:

# Configure error rate threshold (for example, 1% error rate)
kubectl label svc -n <namespace> <service-name> "causely.ai/error-rate-threshold=0.01"

# Configure latency threshold (for example, 500ms)
kubectl label svc -n <namespace> <service-name> "causely.ai/latency-threshold=500.0"

Using Nomad Service Tags

For Nomad services, you can configure thresholds using service tags in your job specification:

job "example" {
group "app" {
service {
name = "my-service"
port = 8080

tags = [
"causely.ai/error-rate-threshold=0.01"
"causely.ai/latency-threshold=500.0"
]
}
}
}

Using Consul Service Metadata

For Consul services, you can configure thresholds using service metadata:

# Register a service with threshold metadata
consul services register \
-name="my-service" \
-port=8080 \
-meta="causely.ai/error-rate-threshold=0.01" \
-meta="causely.ai/latency-threshold=500.0"

# Update existing service metadata
consul services register \
-id="my-service-id" \
-name="my-service" \
-port=8080 \
-meta="causely.ai/error-rate-threshold=0.01" \
-meta="causely.ai/latency-activation-delay=500.0"

Threshold Values

  • Error Rate Threshold: Expressed as a decimal (for example, 0.01 for 1%)
  • Latency Threshold: Expressed in milliseconds (for example, 500.0 for 500 ms)

Best Practices

  1. Start with Defaults: Begin with Causely's automatically learned thresholds
  2. Adjust Based on SLOs: Modify thresholds to match your specific SLO requirements
  3. Monitor Impact: After changing thresholds, monitor how they affect symptom detection
  4. Document Changes: Keep track of threshold changes and their rationale

Example Use Cases

  1. Strict SLO Requirements: Set lower thresholds for critical services
  2. Service-Specific Requirements: Configure different thresholds for different services
  3. Temporary Adjustments: Modify thresholds during maintenance windows

Suggest Additional Thresholds

We're currently exploring ways to make threshold configuration more flexible via the Causely UI and support for additional metrics such as CPU and memory. If there are other thresholds you'd like to configure, please let us know.