Custom Agents

Building Custom Agents with Causely

Most custom agents can access telemetry, but struggle to determine what is actually happening, what caused it, and what action is safe.

Causely provides the system intelligence layer needed to interpret telemetry consistently and make reliable decisions.

Whether you are building internal incident tooling, AI-driven workflows, or automation pipelines, Causely provides the structured system intelligence needed to interpret telemetry and act safely.

When to Build a Custom Agent

Building a custom agent is the right approach when:

You have internal workflows that do not map to existing tools
You want to integrate reliability decisions into your own systems
You are building end-to-end automation (detection → diagnosis → action)
You need full control over logic, policies, and execution

How It Works

Custom agents use Causely to move from raw telemetry to structured, system-aware decisions.

Custom agents typically interact with Causely in one of two ways:

1. MCP Server (Recommended)

Use the Causely MCP Server to provide a standardized interface for agents.

Your agent:

Receives a signal (alert, event, user input)
Queries Causely via MCP
Receives structured outputs (root cause, dependencies, explanation)
Takes action based on those outputs

This is the fastest way to integrate Causely into agent-based systems.

2. API Integration

For more control or non-MCP environments, you can interact directly with the Causely API.

Your system:

Sends queries to Causely (for example, root cause, topology, health)
Receives structured, machine-consumable responses
Uses those responses to drive logic and actions

Tool Ordering: Resolve Entities First

tip

Always call get_entities() before querying metrics, SLOs, topology, symptoms, or slow queries. Most structured tools require an entity ID. get_entities() resolves a service or database name to its ID and returns current health status, type, and labels.

Ask Causely vs Structured Tools

The MCP server exposes two interaction styles. Choose based on what your agent needs to do with the result.

Ask Causely natural language in. Best for open-ended exploration and synthesis.

Structured tools explicit named inputs. Best when your agent needs to act on the result, apply logic, or chain calls.

Use case	Tool
"What's wrong with checkout?" (narrative)	`get_service_summary`
"What happened last night?" (summary)	`ask_causely`
Root cause data for automated routing	`get_root_causes`
Metrics for regression detection	`get_metrics`
Entity ID resolution	`get_entities`
SLO status with burn rate	`get_slo`
Blast radius mapping	`get_topology`
Post-deploy regression check	`reliability_delta`

Workflow Example: Incident Triage Workflow

This example shows a custom agent performing full incident triage using structured MCP tools.

Scenario: An alert fires. The agent needs to identify the root cause, understand impact, and route to the correct team.

# Step 1: Resolve the alerted service name to an entity ID
entity = mcp.call("get_entities", query="checkout-service", entity_types=["Service"])
# Returns: entity ID, current health status, type, and labels

# Step 2: Get root causes for the affected service
root_causes = mcp.call("get_root_causes", impacted_service_ids=[entity[0]["id"]])
# Returns: root causes with severity, impacted services, and remediation guidance

# Step 3: Map the blast radius for the highest-severity root cause
topology = mcp.call(
    "get_topology",
    entity_id=root_causes[0]["entity_id"],
    mode="dependents"
)
# Returns: upstream services affected by this entity's degradation

# Step 4: Route to the correct team and generate a ticket
ticket = mcp.call(
    "generate_ticket",
    task=f"Investigate root cause: {root_causes[0]['name']}"
)

After the incident resolves:

# Generate postmortem documentation
postmortem = mcp.call("postmortem", root_cause_id=root_causes[0]["id"])
# Returns: structured markdown with timeline, blast radius, contributing factors, action items

What Your Agent Gets from Causely

When integrated, your agent can:

Identify the true root cause of issues
Understand service dependencies and failure propagation
Evaluate blast radius before taking action
Work with structured, consistent outputs
Provide explainable decisions

This allows your agent to move beyond querying data to making reliable decisions.

Design Considerations

When building custom agents with Causely:

Trust boundaries: define what actions can be automated vs require approval
Policy enforcement: gate actions based on risk or impact
Observability: log decisions and reasoning for auditability
Fallbacks: handle cases where no clear root cause is identified

Next Steps

Using the MCP Server: full tool reference and key workflows
HolmesGPT: see a reference implementation
API: explore direct API integration

Building Custom Agents with Causely​

When to Build a Custom Agent​

How It Works​

1. MCP Server (Recommended)​

2. API Integration​

Tool Ordering: Resolve Entities First​

Ask Causely vs Structured Tools​

Workflow Example: Incident Triage Workflow​

What Your Agent Gets from Causely​

Design Considerations​

Next Steps​