Symptom Delay

Causely automatically detects service symptoms based on various metrics. To avoid reacting to every temporary spike or drop in raw values, you can configure activation and deactivation delays. This document explains how to configure symptom delays for your services.

Overview

Symptom delays in Causely are used to:

Prevent false positives from temporary metric fluctuations
Provide more stable symptom detection
Avoid reacting to every spike or drop in raw values

Supported Delay Types

Currently, you can configure the following delay types for service symptoms:

Error Rate Activation Delay: Defines how long to wait before activating an error rate symptom
Latency Activation Delay: Defines how long to wait before activating a latency symptom

Configuration Methods

Using Kubernetes Labels

The recommended way to configure symptom delays is using Kubernetes labels. You can apply these labels to your services:

# Configure error rate activation delay (for example, 5 minutes)
kubectl label svc -n <namespace> <service-name> "causely.ai/error-rate-activation-delay=5"

# Configure latency activation delay (for example, 3 minutes)
kubectl label svc -n <namespace> <service-name> "causely.ai/latency-activation-delay=3"

Using Nomad Service Tags

For Nomad services, you can configure symptom delays using service tags in your job specification:

job "example" {
  group "app" {
    service {
      name = "my-service"
      port = 8080

      tags = [
        "causely.ai/error-rate-activation-delay=5",
        "causely.ai/latency-activation-delay=3"
      ]
    }
  }
}

Using Consul Service Metadata

For Consul services, you can configure symptom delays using service metadata:

# Register a service with symptom delay metadata
consul services register \
  -name="my-service" \
  -port=8080 \
  -meta="causely.ai/error-rate-activation-delay=5" \
  -meta="causely.ai/latency-activation-delay=3"

# Update existing service metadata
consul services register \
  -id="my-service-id" \
  -name="my-service" \
  -port=8080 \
  -meta="causely.ai/error-rate-activation-delay=5" \
  -meta="causely.ai/latency-activation-delay=3"

Delay Values

Activation Delays: Expressed in minutes (for example, 5 for 5 minutes)
Minimum Value: 1 minute
Recommended Range: 1-10 minutes for most use cases

Default Behavior

If you don't configure any symptom delays, Causely uses the following default values:

Activation Delay: 10 minutes
Deactivation Delay: 5 minutes

This means that by default:

A symptom will only activate after the threshold has been exceeded for 10 consecutive minutes
A symptom will deactivate after the metrics return to normal levels and remain continuously below the threshold for 5 minutes

How It Works

When you configure symptom delays:

Activation: Causely waits for the specified delay period before activating a symptom, even if the threshold is exceeded
Deactivation: Similarly, Causely waits for the delay period before deactivating a symptom when metrics return to normal

Best Practices

Start with Defaults: Begin with Causely's default behavior (10-minute activation, 5-minute deactivation)
Adjust Based on Service Characteristics:
- Use shorter delays (1-3 minutes) for critical services that need quick response
- Use longer delays (5-10 minutes) for services with frequent but harmless spikes
Monitor Impact: After changing delays, monitor how they affect symptom detection accuracy
Consider Service Patterns: Account for your service's typical behavior patterns when setting delays
Document Changes: Keep track of delay changes and their rationale

Example Use Cases

Noisy Services: Increase delays for services that frequently have temporary spikes
Critical Services: Use shorter delays for services where quick detection is essential
Batch Processing: Configure longer delays for services that handle batch operations with expected temporary load increases
Development Environments: Use longer delays in non-production environments to reduce noise

Configuration Examples

High-Priority Service (Quick Response)

Kubernetes:

# Quick detection for critical services
kubectl label svc -n production payment-service "causely.ai/error-rate-activation-delay=1"
kubectl label svc -n production payment-service "causely.ai/latency-activation-delay=1"

Nomad:

job "payment-service" {
  group "app" {
    service {
      name = "payment-service"
      port = 8080

      tags = [
        "causely.ai/error-rate-activation-delay=1",
        "causely.ai/latency-activation-delay=1"
      ]
    }
  }
}

Consul:

consul services register \
  -name="payment-service" \
  -port=8080 \
  -meta="causely.ai/error-rate-activation-delay=1" \
  -meta="causely.ai/latency-activation-delay=1"

Batch Processing Service (Stable Detection)

Kubernetes:

# Longer delays for batch processing services
kubectl label svc -n production data-processor "causely.ai/error-rate-activation-delay=5"
kubectl label svc -n production data-processor "causely.ai/latency-activation-delay=5"

Nomad:

job "data-processor" {
  group "batch" {
    service {
      name = "data-processor"
      port = 9090

      tags = [
        "causely.ai/error-rate-activation-delay=5",
        "causely.ai/latency-activation-delay=5"
      ]
    }
  }
}

Consul:

consul services register \
  -name="data-processor" \
  -port=9090 \
  -meta="causely.ai/error-rate-activation-delay=5" \
  -meta="causely.ai/latency-activation-delay=5"

Development Environment (Reduced Noise)

Kubernetes:

# Longer delays to reduce noise in development
kubectl label svc -n dev api-service "causely.ai/error-rate-activation-delay=10"
kubectl label svc -n dev api-service "causely.ai/latency-activation-delay=10"

Nomad:

job "api-service" {
  group "dev" {
    service {
      name = "api-service"
      port = 3000

      tags = [
        "causely.ai/error-rate-activation-delay=10",
        "causely.ai/latency-activation-delay=10"
      ]
    }
  }
}

Consul:

consul services register \
  -name="api-service" \
  -port=3000 \
  -meta="causely.ai/error-rate-activation-delay=10" \
  -meta="causely.ai/latency-activation-delay=10"

Overview​

Supported Delay Types​

Configuration Methods​

Using Kubernetes Labels​

Using Nomad Service Tags​

Using Consul Service Metadata​

Delay Values​

Default Behavior​

How It Works​

Best Practices​

Example Use Cases​

Configuration Examples​

High-Priority Service (Quick Response)​

Batch Processing Service (Stable Detection)​

Development Environment (Reduced Noise)​

Overview

Supported Delay Types

Configuration Methods

Using Kubernetes Labels

Using Nomad Service Tags

Using Consul Service Metadata

Delay Values

Default Behavior

How It Works

Best Practices

Example Use Cases

Configuration Examples

High-Priority Service (Quick Response)

Batch Processing Service (Stable Detection)

Development Environment (Reduced Noise)