Skip to main content

Symptom Delay Configuration

Causely automatically detects service symptoms based on various metrics. To avoid reacting to every temporary spike or drop in raw values, you can configure activation and deactivation delays. This document explains how to configure symptom delays for your services.

Overview

Symptom delays in Causely are used to:

  • Prevent false positives from temporary metric fluctuations
  • Provide more stable symptom detection
  • Avoid reacting to every spike or drop in raw values

Supported Delay Types

Currently, you can configure the following delay types for service symptoms:

  • Error Rate Activation Delay: Defines how long to wait before activating an error rate symptom
  • Latency Activation Delay: Defines how long to wait before activating a latency symptom

Configuration Methods

Using Kubernetes Labels

The recommended way to configure symptom delays is using Kubernetes labels. You can apply these labels to your services:

# Configure error rate activation delay (for example, 5 minutes)
kubectl label svc -n <namespace> <service-name> "causely.ai/error-rate-activation-delay=5"

# Configure latency activation delay (for example, 3 minutes)
kubectl label svc -n <namespace> <service-name> "causely.ai/latency-activation-delay=3"

Using Nomad Service Tags

For Nomad services, you can configure symptom delays using service tags in your job specification:

job "example" {
group "app" {
service {
name = "my-service"
port = 8080

tags = [
"causely.ai/error-rate-activation-delay=5",
"causely.ai/latency-activation-delay=3"
]
}
}
}

Using Consul Service Metadata

For Consul services, you can configure symptom delays using service metadata:

# Register a service with symptom delay metadata
consul services register \
-name="my-service" \
-port=8080 \
-meta="causely.ai/error-rate-activation-delay=5" \
-meta="causely.ai/latency-activation-delay=3"

# Update existing service metadata
consul services register \
-id="my-service-id" \
-name="my-service" \
-port=8080 \
-meta="causely.ai/error-rate-activation-delay=5" \
-meta="causely.ai/latency-activation-delay=3"

Delay Values

  • Activation Delays: Expressed in minutes (for example, 5 for 5 minutes)
  • Minimum Value: 1 minute
  • Recommended Range: 1-10 minutes for most use cases

Default Behavior

If you don't configure any symptom delays, Causely uses the following default values:

  • Activation Delay: 10 minutes
  • Deactivation Delay: 5 minutes

This means that by default:

  • A symptom will only activate after the threshold has been exceeded for 10 consecutive minutes
  • A symptom will deactivate after the metrics return to normal levels and remain continuously below the threshold for 5 minutes

How It Works

When you configure symptom delays:

  1. Activation: Causely waits for the specified delay period before activating a symptom, even if the threshold is exceeded
  2. Deactivation: Similarly, Causely waits for the delay period before deactivating a symptom when metrics return to normal

Best Practices

  1. Start with Defaults: Begin with Causely's default behavior (10-minute activation, 5-minute deactivation)
  2. Adjust Based on Service Characteristics:
    • Use shorter delays (1-3 minutes) for critical services that need quick response
    • Use longer delays (5-10 minutes) for services with frequent but harmless spikes
  3. Monitor Impact: After changing delays, monitor how they affect symptom detection accuracy
  4. Consider Service Patterns: Account for your service's typical behavior patterns when setting delays
  5. Document Changes: Keep track of delay changes and their rationale

Example Use Cases

  1. Noisy Services: Increase delays for services that frequently have temporary spikes
  2. Critical Services: Use shorter delays for services where quick detection is essential
  3. Batch Processing: Configure longer delays for services that handle batch operations with expected temporary load increases
  4. Development Environments: Use longer delays in non-production environments to reduce noise

Configuration Examples

High-Priority Service (Quick Response)

Kubernetes:

# Quick detection for critical services
kubectl label svc -n production payment-service "causely.ai/error-rate-activation-delay=1"
kubectl label svc -n production payment-service "causely.ai/latency-activation-delay=1"

Nomad:

job "payment-service" {
group "app" {
service {
name = "payment-service"
port = 8080

tags = [
"causely.ai/error-rate-activation-delay=1",
"causely.ai/latency-activation-delay=1"
]
}
}
}

Consul:

consul services register \
-name="payment-service" \
-port=8080 \
-meta="causely.ai/error-rate-activation-delay=1" \
-meta="causely.ai/latency-activation-delay=1"

Batch Processing Service (Stable Detection)

Kubernetes:

# Longer delays for batch processing services
kubectl label svc -n production data-processor "causely.ai/error-rate-activation-delay=5"
kubectl label svc -n production data-processor "causely.ai/latency-activation-delay=5"

Nomad:

job "data-processor" {
group "batch" {
service {
name = "data-processor"
port = 9090

tags = [
"causely.ai/error-rate-activation-delay=5",
"causely.ai/latency-activation-delay=5"
]
}
}
}

Consul:

consul services register \
-name="data-processor" \
-port=9090 \
-meta="causely.ai/error-rate-activation-delay=5" \
-meta="causely.ai/latency-activation-delay=5"

Development Environment (Reduced Noise)

Kubernetes:

# Longer delays to reduce noise in development
kubectl label svc -n dev api-service "causely.ai/error-rate-activation-delay=10"
kubectl label svc -n dev api-service "causely.ai/latency-activation-delay=10"

Nomad:

job "api-service" {
group "dev" {
service {
name = "api-service"
port = 3000

tags = [
"causely.ai/error-rate-activation-delay=10",
"causely.ai/latency-activation-delay=10"
]
}
}
}

Consul:

consul services register \
-name="api-service" \
-port=3000 \
-meta="causely.ai/error-rate-activation-delay=10" \
-meta="causely.ai/latency-activation-delay=10"