Custom Agents
Building Custom Agents with Causely
Most custom agents can access telemetry, but struggle to determine what is actually happening, what caused it, and what action is safe.
Causely provides the system intelligence layer needed to interpret telemetry consistently and make reliable decisions.
Whether you are building internal incident tooling, AI-driven workflows, or automation pipelines, Causely provides the structured system intelligence needed to interpret telemetry and act safely.
When to Build a Custom Agent
Building a custom agent is the right approach when:
- You have internal workflows that do not map to existing tools
- You want to integrate reliability decisions into your own systems
- You are building end-to-end automation (detection → diagnosis → action)
- You need full control over logic, policies, and execution
How It Works
Custom agents use Causely to move from raw telemetry to structured, system-aware decisions.
Custom agents typically interact with Causely in one of two ways:
1. MCP Server (Recommended)
Use the Causely MCP Server to provide a standardized interface for agents.
Your agent:
- Receives a signal (alert, event, user input)
- Queries Causely via MCP
- Receives structured outputs (root cause, dependencies, explanation)
- Takes action based on those outputs
This is the fastest way to integrate Causely into agent-based systems.
2. API Integration
For more control or non-MCP environments, you can interact directly with the Causely API.
Your system:
- Sends queries to Causely (for example, root cause, topology, health)
- Receives structured, machine-consumable responses
- Uses those responses to drive logic and actions
Tool Ordering: Resolve Entities First
Always call get_entities() before querying metrics, SLOs, topology, symptoms, or slow queries. Most structured tools require an entity ID. get_entities() resolves a service or database name to its ID and returns current health status, type, and labels.
Ask Causely vs Structured Tools
The MCP server exposes two interaction styles. Choose based on what your agent needs to do with the result.
Ask Causely natural language in. Best for open-ended exploration and synthesis.
Structured tools explicit named inputs. Best when your agent needs to act on the result, apply logic, or chain calls.
| Use case | Tool |
|---|---|
| "What's wrong with checkout?" (narrative) | get_service_summary |
| "What happened last night?" (summary) | ask_causely |
| Root cause data for automated routing | get_root_causes |
| Metrics for regression detection | get_metrics |
| Entity ID resolution | get_entities |
| SLO status with burn rate | get_slo |
| Blast radius mapping | get_topology |
| Post-deploy regression check | reliability_delta |
Workflow Example: Incident Triage Workflow
This example shows a custom agent performing full incident triage using structured MCP tools.
Scenario: An alert fires. The agent needs to identify the root cause, understand impact, and route to the correct team.
# Step 1: Resolve the alerted service name to an entity ID
entity = mcp.call("get_entities", query="checkout-service", entity_types=["Service"])
# Returns: entity ID, current health status, type, and labels
# Step 2: Get root causes for the affected service
root_causes = mcp.call("get_root_causes", impacted_service_ids=[entity[0]["id"]])
# Returns: root causes with severity, impacted services, and remediation guidance
# Step 3: Map the blast radius for the highest-severity root cause
topology = mcp.call(
"get_topology",
entity_id=root_causes[0]["entity_id"],
mode="dependents"
)
# Returns: upstream services affected by this entity's degradation
# Step 4: Route to the correct team and generate a ticket
ticket = mcp.call(
"generate_ticket",
task=f"Investigate root cause: {root_causes[0]['name']}"
)
After the incident resolves:
# Generate postmortem documentation
postmortem = mcp.call("postmortem", root_cause_id=root_causes[0]["id"])
# Returns: structured markdown with timeline, blast radius, contributing factors, action items
What Your Agent Gets from Causely
When integrated, your agent can:
- Identify the true root cause of issues
- Understand service dependencies and failure propagation
- Evaluate blast radius before taking action
- Work with structured, consistent outputs
- Provide explainable decisions
This allows your agent to move beyond querying data to making reliable decisions.
Design Considerations
When building custom agents with Causely:
- Trust boundaries: define what actions can be automated vs require approval
- Policy enforcement: gate actions based on risk or impact
- Observability: log decisions and reasoning for auditability
- Fallbacks: handle cases where no clear root cause is identified
Next Steps
- Using the MCP Server: full tool reference and key workflows
- HolmesGPT: see a reference implementation
- API: explore direct API integration