Skip to main content

MCP Server Integration

The Causely MCP server gives agents and AI assistants direct access to Causely's causal reasoning engine. 25 tools across 5 categories let your agent move from raw alerts to structured root cause analysis, dependency maps, and reliability reports, without writing custom integrations.

Key Workflows

These are the four workflows agents use most often. Each maps to a specific sequence of MCP tool calls.

Incident Triage

Identify what's broken and how far it has spread.

Try:

  • "What's broken in production right now?"
  • "Checkout is throwing errors and we have Alertmanager alerts firing. What's the actual root cause?"
  • "Three services are alerting at once. Which is the real problem and which are downstream noise?"
  1. get_symptoms(): see all active symptoms across the entire environment (no filters needed)
  2. get_root_causes(): identify all active root causes and impacted services
  3. get_alerts(alert_name_filters=...,): drill into a specific alert's cause
  4. get_topology(entity_id=..., mode="dependents"): map which upstream services are affected

Handled automatically by causely-correlated-incidents, or causely-alert-triage if you're starting from a specific alert.

Quick Service Health

Get a complete health picture for a specific service in two calls.

Try:

  • "Is checkout healthy?"
  • "Give me a full health picture for the payments service: status, open issues, SLOs."
  • "Before I page anyone, is there actually a problem with database-service or is this alert noise?"
  • "Are there are any concerning errors or warnings in the logs for the frontend service over the last hour?"
  1. get_entities(query="service-name", entity_types=["Service"]): resolve the service name to its entity ID
  2. get_service_summary(service="service-name"): full snapshot: status, active symptoms, root causes, SLOs, metrics, recent events, error logs
  3. get_logs(entity_id=...,): retrieves live log output for a running service

Handled automatically by the causely-health-reporting skill.

Post-Deploy Validation

Check whether a deployment introduced regressions.

Try:

  • "Did the last deploy to payments cause any regressions?"
  • "We deployed cart service 30 minutes ago. How does it look compared to before?"
  • "Check all services my team owns, did anything degrade after today's deploys?"
  1. reliability_delta(service="service-name"): compare CPU, memory, latency, and error rate before vs after the most recent deployment
  2. fleet_reliability_delta(team="team-name"): batch check across all services for a team, namespace, or explicit list

Handled automatically by the causely-change-impact skill.

Post-Incident Reporting

Generate postmortem documentation and action items from a resolved incident.

Try:

  • "The payments outage is resolved. Draft a postmortem."
  • "Write up what happened to checkout this morning: timeline, root cause, and what was affected."
  • "Payments is back up. Draft the postmortem and create a follow-up ticket for the team."
  1. get_root_causes(root_cause_id=...): retrieve full root cause details, timeline, and blast radius
  2. postmortem(root_cause_id=...): generate a structured postmortem draft
  3. generate_ticket(task="..."): create a follow-up engineering ticket for Jira, GitHub Issues, or Linear

Handled automatically by the causely-postmortem skill.

Skills automate the tool-selection step shown in the workflows above. You describe your situation in natural language; the right specialist activates and runs the correct tool sequence for you. Skills are available for Claude Code, Claude Desktop, and Cursor.

SituationSkillTry
Incoming alertcausely-alert-triage"PagerDuty just paged for checkout-latency. What's the actual cause?"
Post-deploy validationcausely-change-impact"Did the last deploy to payments cause any regressions?"
Multi-service outagecausely-correlated-incidents"Three services are alerting at once. What's the real problem?"
Health summary / morning standupcausely-health-reporting"Give me a morning health report for production."
Kubernetes investigationcausely-k8s-investigation"The orders pod keeps OOMKilling. Why?"
Postmortem / ticketcausely-postmortem"Draft a postmortem for the checkout outage that resolved an hour ago."

See the Skills page for install instructions, full skill detail, and override options.

Choose Your Client

Select your tool for a copy-paste config snippet, config file location, and restart instructions.

ClientTransportConfig format
Claude CodeHTTP.mcp.json (mcpServers)
Claude Desktopstdio via mcp-remoteclaude_desktop_config.json (mcpServers)
CodexHTTPconfig.toml (mcp_servers)
CursorHTTP.cursor/mcp.json (mcpServers)
VS Code (GitHub Copilot)HTTP.vscode/mcp.json (servers)

Verify your connection by asking: ”Causely: What defects are currently active?”

Other MCP-compatible Clients

The clients above have dedicated setup pages. The following tools also support the Causely MCP server, point them at https://api.causely.app/mcp using your tool's HTTP MCP config. See Advanced Authentication for credential options.

IDEs and Editors: JetBrains IDEs (IntelliJ IDEA, PyCharm, WebStorm, GoLand, and others), Windsurf, Zed

CLIs: Kiro CLI, Amp, Atlassian Rovo DEV CLI, and other MCP-compatible CLI tools

Agent Frameworks: HolmesGPT

Authentication

The MCP server validates Frontegg-issued Bearer tokens. For most clients, browser-based OAuth runs automatically, no manual setup needed. For non-interactive setups (automation, CI) or clients that only support stdio, including the stdio/mcp-remote fallback, see Advanced Authentication.

Using the Tool Reference

tip

The reference below is for teams building custom agents that need explicit tool control. If you're using Claude, Cursor, Codex, or any conversational agent, you can skim it for capability awareness. In most cases, you can describe what you want and the agent picks the right tools. One thing worth knowing if you do go programmatic: most structured tools require an entity ID, so get_entities() is usually the right first call.

Tool Selection: Ask Causely vs Structured Tools

The MCP server exposes two interaction styles. Choose based on what your agent needs to do with the result.

Use caseRecommended tool
Narrative health summary (“Is checkout healthy?”)get_service_summary
Historical questions (“What happened last night?”)ask_causely
Incident standup summary (“What happened to checkout yesterday?”)ask_causely
SLO overview, error budget, and burn rate (“Are any SLOs at risk?” / “Is the payments SLO burning?”)get_slo
Programmatic root cause output (“What is the root cause of latency on payments?”)get_root_causes
Time-series metric data (“What is the p95 latency for the last hour on payments?”)get_metrics
Entity ID resolution (“Resolve the entity ID for the payments service”)get_entities
Dependency graph (“What services depend on payments?”)get_topology
Post-deploy regression check (“Did the latest payments deploy introduce a regression?”)reliability_delta

Ask Causely natural language in. Best for open-ended exploration and synthesis.

Structured tools explicit named inputs. Best when your agent needs to act on the result, apply logic, or chain calls.

What Agents Get vs Raw Telemetry

Raw telemetryCausely MCP
Root cause identificationCorrelation-based, requires analysisDeterministic causal analysis
Dependency awarenessManual mapping requiredLive topology from observed traffic
Blast radiusEstimatedComputed from causal graph
Structured outputCustom parsing requiredTyped tool responses
Time to insightMinutes of analysisSingle tool call

Full Tool Reference

25 tools across 5 categories. All tools are available to any MCP-compatible agent or assistant.

Entity Resolution

ToolWhen to use
get_entitiesStart here. Resolve a service or database name to its ID; list all entities in a namespace; check current health status
get_label_valuesEnumerate valid label values (team, product, cluster, namespace) before fanning out queries across environments
list_namespacesDiscover Kubernetes namespace names before resolving entities or scanning a namespace
list_clustersDiscover cluster names before scoping multi-cluster queries

Data Retrieval

ToolWhen to use
get_metricsRetrieve numeric metric data (p95 latency, error rate, CPU, memory, throughput): the only tool that returns time-series
get_logsInspect live service logs, or retrieve evidence logs captured at root cause detection time
get_alertsStart triage from an alert name (PagerDuty, Slack, Datadog); distinguish alerts mapped to causal analysis from noise
get_eventsCorrelate symptom onset with deployments, restarts, scaling events, or config changes
get_configInvestigate configuration drift; verify deployment manifest matches expectations
get_slow_queriesIdentify database queries consuming the most execution time; follow up on database root causes

Health & Diagnosis

ToolWhen to use
ask_causelyOpen-ended questions and synthesis: historical summaries, standup recaps, anything where narrative output is more useful than structured data
get_symptomsCall with no filters to see all active symptoms across the entire environment or filter for specific entity, namespace or cluster
get_root_causesIdentify all active root causes; filter by impacted service, symptom, or root cause ID
get_entity_healthStructured health summary for non-Service entities (databases, pods, queues, topics, tables)
get_environment_healthStructured health summary for the environment, can be scoped to specific namespaces or services
get_sloCheck SLO state, error budget remaining, and burn rate
get_topologyFind upstream blast radius (dependents), downstream dependencies, or full data-flow graph
get_integration_statusVerify monitoring coverage; check scraper health by cluster
triageFocused health summary by entity name or root cause ID: no entity ID pre-resolution needed
team_healthHealth summary for all services owned by a team; degraded and critical services listed first
get_service_summaryComprehensive health snapshot for a single service: status, symptoms, root causes, SLOs, metrics, events, logs

Reporting & Postmortems

ToolWhen to use
postmortemGenerate a deterministic postmortem draft for a resolved incident from Causely data
generate_ticketCreate a structured engineering ticket suitable for Jira, GitHub Issues, or Linear

Reliability & Deployment

ToolWhen to use
reliability_deltaPost-deploy regression check for a single service: compare resource consumption before/after most recent deployment
fleet_reliability_deltaBatch regression check across a team, namespace, or explicit service list (up to 20 services per call)

Feature Demos

Solving Slow Database Queries

Helm Chart Example