Skip to main content

Agent Integration

Building Reliable Agents with Causely

Agents fail not because they lack data. They fail because data alone does not explain causality.

An agent with access to metrics, logs, and traces still cannot reliably determine what caused an issue, how far it has spread, or what action is safe to take. That requires a causal model: a structured understanding of how services, dependencies, and failure patterns relate.

Causely provides that model. Agents query Causely through the MCP server and receive structured, deterministic answers, including root causes, blast radius, dependency maps, and remediation guidance, instead of raw signals to interpret.

The Gap in Today's Agent Architectures

Most agent-driven systems run into three core limitations:

Information gap
Agents can retrieve telemetry, but cannot consistently determine what is happening or what matters.

System gap
There is no shared understanding of how services, infrastructure, and dependencies relate to each other.

Execution gap
Agents lack a reliable way to determine which actions are safe and how to coordinate them.

As a result, agents require human interpretation, and automation breaks down at scale.

Where Causely Fits

Causely provides a system intelligence layer that continuously models how your system behaves: its services, dependencies, and failure propagation.

Instead of reasoning over raw telemetry, agents interact with structured, deterministic system knowledge. Decisions are based on how the system actually behaves, not on correlation or heuristics.

Architecture Overview

[Agent (for example Holmes or custom agent)]

[Causely (causal model + reasoning engine)]

[Observability + Infrastructure (metrics, traces, logs, alerts)]
  • Agent: orchestrates workflows, queries systems, and takes action
  • Causely: builds and maintains a causal model and provides deterministic reasoning
  • Observability + Infrastructure: provides raw signals and telemetry

What Your Agent Can Do

The Causely MCP server exposes 24 tools across 5 categories. Here is what each category enables:

  • Entity Resolution: Resolve service and database names to IDs, enumerate namespaces and clusters, check current health status. Most workflows start here.

  • Data Retrieval: Retrieve time-series metrics, live logs, alert history, deployment events, configuration files, and slow query analysis for any entity.

  • Health & Diagnosis: Get active symptoms environment-wide, identify root causes with impacted services and remediation guidance, check SLOs, map service topology, and get structured health summaries for services, teams, or individual entities.

  • Reporting & Postmortems: Generate deterministic postmortem drafts and structured engineering tickets from resolved incident data.

  • Reliability & Deployment: Compare resource consumption before and after deployments for a single service or an entire fleet.

Integration Paths

Choose based on how much you want to build.

OptionBest forWhat you get
MCP ServerAny MCP-compatible agent or assistantStandardized interface to all 24 Causely tools; works with Cursor, Claude Code, VS Code, and others
HolmesGPTTeams already using HolmesPre-built agent with Causely MCP configured; no custom integration required
Custom AgentsTeams building internal tooling or automation pipelinesFull control over logic, policies, and execution; MCP or direct API

If you are starting fresh, use the MCP Server. It works with any agent that supports the Model Context Protocol and requires no custom code.

Example Workflow

Scenario: High error rate alert

  1. The agent receives an alert
  2. The agent calls get_entities() to resolve the alerted service name to an entity ID
  3. The agent calls get_root_causes() to identify the source
  4. Causely returns:
    • Root cause service
    • Affected dependencies
    • Explanation of why this is the cause
    • Remediation guidance
  5. The agent:
    • Notifies the correct team
    • Suggests or executes remediation

When This Approach Is Most Valuable

This architecture is most effective when:

  • You operate distributed systems with many interdependent services
  • You already have observability in place
  • You are building or evaluating automated incident workflows

Summary

Causely does not replace your agents or your observability stack.

It provides the system intelligence layer required for agents to interpret telemetry consistently, identify true root causes, and take safe, coordinated action.