Skip to main content

Custom Agents

Building Custom Agents with Causely

Most custom agents can access telemetry, but struggle to determine what is actually happening, what caused it, and what action is safe.

Causely provides the system intelligence layer needed to interpret telemetry consistently and make reliable decisions.

Whether you are building internal incident tooling, AI-driven workflows, or automation pipelines, Causely provides the structured system intelligence needed to interpret telemetry and act safely.

When to Build a Custom Agent

Building a custom agent is the right approach when:

  • You have internal workflows that do not map to existing tools
  • You want to integrate reliability decisions into your own systems
  • You are building end-to-end automation (detection → diagnosis → action)
  • You need full control over logic, policies, and execution

How It Works

Custom agents use Causely to move from raw telemetry to structured, system-aware decisions.

Custom agents typically interact with Causely in one of two ways:

Use the Causely MCP Server to provide a standardized interface for agents.

Your agent:

  1. Receives a signal (alert, event, user input)
  2. Queries Causely via MCP
  3. Receives structured outputs (root cause, dependencies, explanation)
  4. Takes action based on those outputs

This is the fastest way to integrate Causely into agent-based systems.

2. API Integration

For more control or non-MCP environments, you can interact directly with the Causely API.

Your system:

  1. Sends queries to Causely (for example, root cause, topology, health)
  2. Receives structured, machine-consumable responses
  3. Uses those responses to drive logic and actions

Tool Ordering: Resolve Entities First

tip

Always call get_entities() before querying metrics, SLOs, topology, symptoms, or slow queries. Most structured tools require an entity ID. get_entities() resolves a service or database name to its ID and returns current health status, type, and labels.

Ask Causely vs Structured Tools

The MCP server exposes two interaction styles. Choose based on what your agent needs to do with the result.

Ask Causely natural language in. Best for open-ended exploration and synthesis.

Structured tools explicit named inputs. Best when your agent needs to act on the result, apply logic, or chain calls.

Use caseTool
"What's wrong with checkout?" (narrative)get_service_summary
"What happened last night?" (summary)ask_causely
Root cause data for automated routingget_root_causes
Metrics for regression detectionget_metrics
Entity ID resolutionget_entities
SLO status with burn rateget_slo
Blast radius mappingget_topology
Post-deploy regression checkreliability_delta

Workflow Example: Incident Triage Workflow

This example shows a custom agent performing full incident triage using structured MCP tools.

Scenario: An alert fires. The agent needs to identify the root cause, understand impact, and route to the correct team.

# Step 1: Resolve the alerted service name to an entity ID
entity = mcp.call("get_entities", query="checkout-service", entity_types=["Service"])
# Returns: entity ID, current health status, type, and labels

# Step 2: Get root causes for the affected service
root_causes = mcp.call("get_root_causes", impacted_service_ids=[entity[0]["id"]])
# Returns: root causes with severity, impacted services, and remediation guidance

# Step 3: Map the blast radius for the highest-severity root cause
topology = mcp.call(
"get_topology",
entity_id=root_causes[0]["entity_id"],
mode="dependents"
)
# Returns: upstream services affected by this entity's degradation

# Step 4: Route to the correct team and generate a ticket
ticket = mcp.call(
"generate_ticket",
task=f"Investigate root cause: {root_causes[0]['name']}"
)

After the incident resolves:

# Generate postmortem documentation
postmortem = mcp.call("postmortem", root_cause_id=root_causes[0]["id"])
# Returns: structured markdown with timeline, blast radius, contributing factors, action items

What Your Agent Gets from Causely

When integrated, your agent can:

  • Identify the true root cause of issues
  • Understand service dependencies and failure propagation
  • Evaluate blast radius before taking action
  • Work with structured, consistent outputs
  • Provide explainable decisions

This allows your agent to move beyond querying data to making reliable decisions.

Design Considerations

When building custom agents with Causely:

  • Trust boundaries: define what actions can be automated vs require approval
  • Policy enforcement: gate actions based on risk or impact
  • Observability: log decisions and reasoning for auditability
  • Fallbacks: handle cases where no clear root cause is identified

Next Steps