Skip to main content

MCP Server Integration

The Causely MCP server gives agents and AI assistants direct access to Causely's causal reasoning engine. 29 tools across 5 categories let your agent move from raw alerts to structured root cause analysis, dependency maps, and reliability reports, without writing custom integrations.

Key Workflows

These are the four workflows agents use most often. Each maps to a specific sequence of MCP tool calls.

Incident Triage

Identify what's broken and how far it has spread.

Try:

  • "What's broken in production right now?"
  • "Checkout is throwing errors and we have Alertmanager alerts firing. What's the actual root cause?"
  • "Three services are alerting at once. Which is the real problem and which are downstream noise?"
  1. get_symptoms(): see all active symptoms across the entire environment (no filters needed)
  2. get_root_causes(): identify all active root causes and impacted services
  3. get_alerts(alert_name_filters=...,): drill into a specific alert's cause
  4. get_topology(entity_id=..., mode="dependents"): map which upstream services are affected

Handled automatically by causely-correlated-incidents, or causely-alert-triage if you're starting from a specific alert.

Quick Service Health

Get a complete health picture for a specific service in two calls.

Try:

  • "Is checkout healthy?"
  • "Give me a full health picture for the payments service: status, open issues, SLOs."
  • "Before I page anyone, is there actually a problem with database-service or is this alert noise?"
  • "Are there are any concerning errors or warnings in the logs for the frontend service over the last hour?"
  1. get_entities(query="service-name", entity_types=["Service"]): resolve the service name to its entity ID
  2. get_service_summary(service="service-name"): full snapshot: status, active symptoms, root causes, SLOs, metrics, recent events, error logs
  3. get_logs(entity_id=...,): retrieves live log output for a running service

Handled automatically by the causely-health-reporting skill.

Post-Deploy Validation

Check whether a deployment introduced regressions.

Try:

  • "Did the last deploy to payments cause any regressions?"
  • "We deployed cart service 30 minutes ago. How does it look compared to before?"
  • "Check all services my team owns, did anything degrade after today's deploys?"
  1. reliability_delta(service="service-name"): compare CPU, memory, latency, and error rate before vs after the most recent deployment
  2. fleet_reliability_delta(team="team-name"): batch check across all services for a team, namespace, or explicit list

Handled automatically by the causely-change-impact skill.

Post-Incident Reporting

Generate postmortem documentation and action items from a resolved incident.

Try:

  • "The payments outage is resolved. Draft a postmortem."
  • "Write up what happened to checkout this morning: timeline, root cause, and what was affected."
  • "Payments is back up. Draft the postmortem and create a follow-up ticket for the team."
  1. get_root_causes(root_cause_id=...): retrieve full root cause details, timeline, and blast radius
  2. postmortem(root_cause_id=...): generate a structured postmortem draft
  3. generate_ticket(task="..."): create a follow-up engineering ticket for Jira, GitHub Issues, or Linear

Handled automatically by the causely-postmortem skill.

Skills automate the tool-selection step shown in the workflows above. You describe your situation in natural language; the right specialist activates and runs the correct tool sequence for you. Skills are available for Claude Code, Claude Desktop, and Cursor.

SituationSkillTry
Incoming alertcausely-alert-triage"PagerDuty just paged for checkout-latency. What's the actual cause?"
Post-deploy validationcausely-change-impact"Did the last deploy to payments cause any regressions?"
Multi-service outagecausely-correlated-incidents"Three services are alerting at once. What's the real problem?"
Health summary / morning standupcausely-health-reporting"Give me a morning health report for production."
Kubernetes investigationcausely-k8s-investigation"The orders pod keeps OOMKilling. Why?"
Postmortem / ticketcausely-postmortem"Draft a postmortem for the checkout outage that resolved an hour ago."

See the Skills page for install instructions, full skill detail, and override options.

Choose Your Client

Select your tool for a copy-paste config snippet, config file location, and restart instructions.

ClientTransportConfig format
Claude CodeHTTP.mcp.json (mcpServers)
Claude Desktopstdio via mcp-remoteclaude_desktop_config.json (mcpServers)
CodexHTTPconfig.toml (mcp_servers)
CursorHTTP.cursor/mcp.json (mcpServers)
VS Code (GitHub Copilot)HTTP.vscode/mcp.json (servers)

Verify your connection by asking: ”Causely: What defects are currently active?”

Other MCP-compatible Clients

The clients above have dedicated setup pages. The following tools also support the Causely MCP server, point them at https://api.causely.app/mcp using your tool's HTTP MCP config. See Advanced Authentication for credential options.

IDEs and Editors: JetBrains IDEs (IntelliJ IDEA, PyCharm, WebStorm, GoLand, and others), Windsurf, Zed

CLIs: Kiro CLI, Amp, Atlassian Rovo DEV CLI, and other MCP-compatible CLI tools

Agent Frameworks: HolmesGPT

Authentication

The MCP server validates Frontegg-issued Bearer tokens. For most clients, browser-based OAuth runs automatically, no manual setup needed. For non-interactive setups (automation, CI) or clients that only support stdio, including the stdio/mcp-remote fallback, see Advanced Authentication.

Using the Tool Reference

tip

The reference below is for teams building custom agents that need explicit tool control. If you're using Claude, Cursor, Codex, or any conversational agent, you can skim it for capability awareness. In most cases, you can describe what you want and the agent picks the right tools. One thing worth knowing if you do go programmatic: most structured tools require an entity ID, so get_entities() is usually the right first call.

What Agents Get vs Raw Telemetry

Raw telemetryCausely MCP
Root cause identificationCorrelation-based, requires analysisDeterministic causal analysis
Dependency awarenessManual mapping requiredLive topology from observed traffic
Blast radiusEstimatedComputed from causal graph
Structured outputCustom parsing requiredTyped tool responses
Time to insightMinutes of analysisSingle tool call

Full Tool Reference

29 tools across 5 categories. All tools are available to any MCP-compatible agent or assistant.

Entity Resolution

ToolWhen to use
get_entitiesStart here. Resolve a service or database name to its ID; list all entities in a namespace; check current health status
name_lookupResolve any name, including service, cluster, namespace, root cause name, or symptom name, to an entity ID for use in other tools
get_label_valuesEnumerate valid label values (team, product, cluster, namespace) before fanning out queries across environments

Data Retrieval

ToolWhen to use
get_metricsRetrieve numeric metric data (p95 latency, error rate, CPU, memory, throughput): the only tool that returns time-series
get_logsInspect live service logs, or retrieve evidence logs captured at root cause detection time
get_alertsStart triage from an alert name (PagerDuty, Slack, Datadog); distinguish alerts mapped to causal analysis from noise
get_eventsCorrelate symptom onset with deployments, restarts, scaling events, or config changes
get_configInvestigate configuration drift; verify deployment manifest matches expectations
get_slow_queriesIdentify database queries consuming the most execution time; follow up on database root causes

Health & Diagnosis

ToolWhen to use
get_symptomsCall with no filters to see all active symptoms across the entire environment or filter for specific entity, namespace or cluster
get_root_causesIdentify active root causes; filter by impacted service, symptom, root cause ID, or a start/end date range; use start/end dates when investigating a specific past time window
get_entity_healthStructured health summary for non-Service entities (databases, pods, queues, topics, tables)
get_environment_healthStructured health summary for the environment, can be scoped to specific namespaces or services
get_sloCheck SLO state, error budget remaining, and burn rate
get_topologyFind upstream blast radius (dependents), downstream dependencies, or full data-flow graph
get_integration_statusVerify monitoring coverage; check scraper health by cluster
get_incident_impactGiven a root cause ID (or an entity ID + root cause name), returns the responsible service and its business context, plus all impacted services and their business context
team_healthHealth summary for all services owned by a team; degraded and critical services listed first
get_service_summaryComprehensive health snapshot for a single service: status, symptoms, root causes, SLOs, metrics, events, logs
investigate_alertInvestigate a resolved alert from get_alerts; maps the alert to its entity and returns the standard get_entity_health result alongside the original alert
rank_entitiesRank services, topics, tables, or endpoints by number of dependencies or dependents, a single bulk query instead of looping get_topology
get_potential_diagnosesActive and model-inferred diagnosis hypotheses for a specific entity; includes inactive and causality-only potentials not returned by get_root_causes
get_potential_observable_signalsAll observable signals on a specific entity (active, inactive, and causality potential state); use before get_signal_potential_diagnoses to find internal signal names
get_signal_potential_diagnosesReverse lookup: given a symptom, event, or SLO on an entity, return the diagnosis from the causality model that could explain it
get_diagnosis_observable_signalsRetrieve the theoretical causality chain for a diagnosis, including downstream symptoms, events, and SLOs it could cause according to the causal model; compare to observed signals from get_symptoms

Reporting & Postmortems

ToolWhen to use
postmortemGenerate a deterministic postmortem draft for a resolved incident from Causely data
generate_ticketCreate a structured engineering ticket suitable for Jira, GitHub Issues, or Linear

Reliability & Deployment

ToolWhen to use
reliability_deltaPost-deploy regression check for a single service: compare resource consumption before/after most recent deployment
fleet_reliability_deltaBatch regression check across a team, namespace, or explicit service list (up to 20 services per call)

Feature Demos

Solving Slow Database Queries

Helm Chart Example