Skip to main content

Thresholds for Service Symptoms

Causely automatically learns and detects service symptoms based on various metrics. However, you may want to customize these thresholds to better match your specific requirements and SLO definitions. This document explains how to configure custom thresholds for your services.

Overview

Thresholds in Causely are used to:

  • Define custom boundaries for service symptoms
  • Specify SLO violation criteria
  • Override automatically learned thresholds

Supported Thresholds

Currently, you can configure the following thresholds:

  • Error Rate Threshold: Defines the maximum acceptable error rate for a service
  • Latency Threshold: Defines the maximum acceptable latency for a service (in milliseconds)

In addition to overriding thresholds directly, you can configure a minimum value for learned thresholds. This allows Causely to continue learning dynamically while ensuring thresholds do not drop below an acceptable baseline.

Configuration Methods

Using the Causely UI

Using the UI allows you to:

  • Inspect the learned threshold alongside observed metrics (for example P90 and P99 for latency)
  • Override thresholds to match documented SLOs or performance requirements
  • Set a minimum value that bounds how low a learned threshold can go while preserving adaptive learning
  • Immediately see how a custom threshold compares to real traffic patterns

To configure thresholds in the UI:

  1. Navigate to the service you want to configure.
  2. Select the Metrics tab for the service, then select the relevant symptom metric Request Duration or Request Error Rate.
  3. Click the pencil icon next to the threshold to edit the value.
  4. Save the change to apply the override. UI-based configuration is best suited for teams that want quick iteration, visibility into learned behavior, and explicit control without modifying service metadata or deployment configuration.
Edit Request Duration Threshold

Using Kubernetes Labels

The recommended way to configure thresholds is using Kubernetes labels. You can apply these labels to your services:

# Configure error rate threshold (for example, 1% error rate)
kubectl label svc -n <namespace> <service-name> "causely.ai/error-rate-threshold=0.01"

# Configure latency threshold (for example, 500ms)
kubectl label svc -n <namespace> <service-name> "causely.ai/latency-threshold=500.0"

Using Nomad Service Tags

For Nomad services, you can configure thresholds using service tags in your job specification:

job "example" {
group "app" {
service {
name = "my-service"
port = 8080

tags = [
"causely.ai/error-rate-threshold=0.01"
"causely.ai/latency-threshold=500.0"
]
}
}
}

Using Consul Service Metadata

For Consul services, you can configure thresholds using service metadata:

# Register a service with threshold metadata
consul services register \
-name="my-service" \
-port=8080 \
-meta="causely.ai/error-rate-threshold=0.01" \
-meta="causely.ai/latency-threshold=500.0"

# Update existing service metadata
consul services register \
-id="my-service-id" \
-name="my-service" \
-port=8080 \
-meta="causely.ai/error-rate-threshold=0.01" \
-meta="causely.ai/latency-activation-delay=500.0"

Threshold Values

  • Error Rate Threshold: Expressed as a decimal (for example, 0.01 for 1%)
  • Latency Threshold: Expressed in milliseconds (for example, 500.0 for 500 ms)

Best Practices

  1. Start with Defaults: Begin with Causely's automatically learned thresholds
  2. Adjust Based on SLOs: Modify thresholds to match your specific SLO requirements
  3. Monitor Impact: After changing thresholds, monitor how they affect symptom detection
  4. Document Changes: Keep track of threshold changes and their rationale

Example Use Cases

  1. Strict SLO Requirements: Set lower thresholds for critical services
  2. Service-Specific Requirements: Configure different thresholds for different services
  3. Temporary Adjustments: Modify thresholds during maintenance windows

Learned Threshold Minimums

You can configure a minimum value for a learned threshold:

This minimum value acts as a lower bound, ensuring that thresholds do not fall below an acceptable baseline while still allowing Causely to adapt to changing conditions. This feature is useful for maintaining consistent symptom detection criteria without sacrificing the benefits of dynamic learning.

Suggest Additional Thresholds

We're currently exploring ways to make threshold configuration more flexible via the Causely UI and support for additional metrics such as CPU and memory. If there are other thresholds you'd like to configure, please let us know.