MCP Server Integration

The Causely MCP server gives agents and AI assistants direct access to Causely's causal reasoning engine. 25 tools across 5 categories let your agent move from raw alerts to structured root cause analysis, dependency maps, and reliability reports, without writing custom integrations.

Key Workflows

These are the four workflows agents use most often. Each maps to a specific sequence of MCP tool calls.

Incident Triage

Identify what's broken and how far it has spread.

Try:

"What's broken in production right now?"
"Checkout is throwing errors and we have Alertmanager alerts firing. What's the actual root cause?"
"Three services are alerting at once. Which is the real problem and which are downstream noise?"

get_symptoms(): see all active symptoms across the entire environment (no filters needed)
get_root_causes(): identify all active root causes and impacted services
get_alerts(alert_name_filters=...,): drill into a specific alert's cause
get_topology(entity_id=..., mode="dependents"): map which upstream services are affected

Handled automatically by causely-correlated-incidents, or causely-alert-triage if you're starting from a specific alert.

Quick Service Health

Get a complete health picture for a specific service in two calls.

Try:

"Is checkout healthy?"
"Give me a full health picture for the payments service: status, open issues, SLOs."
"Before I page anyone, is there actually a problem with database-service or is this alert noise?"
"Are there are any concerning errors or warnings in the logs for the frontend service over the last hour?"

get_entities(query="service-name", entity_types=["Service"]): resolve the service name to its entity ID
get_service_summary(service="service-name"): full snapshot: status, active symptoms, root causes, SLOs, metrics, recent events, error logs
get_logs(entity_id=...,): retrieves live log output for a running service

Handled automatically by the causely-health-reporting skill.

Post-Deploy Validation

Check whether a deployment introduced regressions.

Try:

"Did the last deploy to payments cause any regressions?"
"We deployed cart service 30 minutes ago. How does it look compared to before?"
"Check all services my team owns, did anything degrade after today's deploys?"

reliability_delta(service="service-name"): compare CPU, memory, latency, and error rate before vs after the most recent deployment
fleet_reliability_delta(team="team-name"): batch check across all services for a team, namespace, or explicit list

Handled automatically by the causely-change-impact skill.

Post-Incident Reporting

Generate postmortem documentation and action items from a resolved incident.

Try:

"The payments outage is resolved. Draft a postmortem."
"Write up what happened to checkout this morning: timeline, root cause, and what was affected."
"Payments is back up. Draft the postmortem and create a follow-up ticket for the team."

get_root_causes(root_cause_id=...): retrieve full root cause details, timeline, and blast radius
postmortem(root_cause_id=...): generate a structured postmortem draft
generate_ticket(task="..."): create a follow-up engineering ticket for Jira, GitHub Issues, or Linear

Handled automatically by the causely-postmortem skill.

Skills (Recommended)

Skills automate the tool-selection step shown in the workflows above. You describe your situation in natural language; the right specialist activates and runs the correct tool sequence for you. Skills are available for Claude Code, Claude Desktop, and Cursor.

Situation	Skill	Try
Incoming alert	`causely-alert-triage`	"PagerDuty just paged for checkout-latency. What's the actual cause?"
Post-deploy validation	`causely-change-impact`	"Did the last deploy to payments cause any regressions?"
Multi-service outage	`causely-correlated-incidents`	"Three services are alerting at once. What's the real problem?"
Health summary / morning standup	`causely-health-reporting`	"Give me a morning health report for production."
Kubernetes investigation	`causely-k8s-investigation`	"The orders pod keeps OOMKilling. Why?"
Postmortem / ticket	`causely-postmortem`	"Draft a postmortem for the checkout outage that resolved an hour ago."

See the Skills page for install instructions, full skill detail, and override options.

Choose Your Client

Select your tool for a copy-paste config snippet, config file location, and restart instructions.

Client	Transport	Config format
Claude Code	HTTP	`.mcp.json` (`mcpServers`)
Claude Desktop	stdio via `mcp-remote`	`claude_desktop_config.json` (`mcpServers`)
Codex	HTTP	`config.toml` (`mcp_servers`)
Cursor	HTTP	`.cursor/mcp.json` (`mcpServers`)
VS Code (GitHub Copilot)	HTTP	`.vscode/mcp.json` (`servers`)

Verify your connection by asking: ”Causely: What defects are currently active?”

Other MCP-compatible Clients

The clients above have dedicated setup pages. The following tools also support the Causely MCP server, point them at https://api.causely.app/mcp using your tool's HTTP MCP config. See Advanced Authentication for credential options.

IDEs and Editors: JetBrains IDEs (IntelliJ IDEA, PyCharm, WebStorm, GoLand, and others), Windsurf, Zed

CLIs: Kiro CLI, Amp, Atlassian Rovo DEV CLI, and other MCP-compatible CLI tools

Agent Frameworks: HolmesGPT

Authentication

The MCP server validates Frontegg-issued Bearer tokens. For most clients, browser-based OAuth runs automatically, no manual setup needed. For non-interactive setups (automation, CI) or clients that only support stdio, including the stdio/mcp-remote fallback, see Advanced Authentication.

Using the Tool Reference

tip

The reference below is for teams building custom agents that need explicit tool control. If you're using Claude, Cursor, Codex, or any conversational agent, you can skim it for capability awareness. In most cases, you can describe what you want and the agent picks the right tools. One thing worth knowing if you do go programmatic: most structured tools require an entity ID, so get_entities() is usually the right first call.

Tool Selection: Ask Causely vs Structured Tools

The MCP server exposes two interaction styles. Choose based on what your agent needs to do with the result.

Use case	Recommended tool
Narrative health summary (“Is checkout healthy?”)	`get_service_summary`
Historical questions (“What happened last night?”)	`ask_causely`
Incident standup summary (“What happened to checkout yesterday?”)	`ask_causely`
SLO overview, error budget, and burn rate (“Are any SLOs at risk?” / “Is the payments SLO burning?”)	`get_slo`
Programmatic root cause output (“What is the root cause of latency on payments?”)	`get_root_causes`
Time-series metric data (“What is the p95 latency for the last hour on payments?”)	`get_metrics`
Entity ID resolution (“Resolve the entity ID for the payments service”)	`get_entities`
Dependency graph (“What services depend on payments?”)	`get_topology`
Post-deploy regression check (“Did the latest payments deploy introduce a regression?”)	`reliability_delta`

Ask Causely natural language in. Best for open-ended exploration and synthesis.

Structured tools explicit named inputs. Best when your agent needs to act on the result, apply logic, or chain calls.

What Agents Get vs Raw Telemetry

	Raw telemetry	Causely MCP
Root cause identification	Correlation-based, requires analysis	Deterministic causal analysis
Dependency awareness	Manual mapping required	Live topology from observed traffic
Blast radius	Estimated	Computed from causal graph
Structured output	Custom parsing required	Typed tool responses
Time to insight	Minutes of analysis	Single tool call

Full Tool Reference

25 tools across 5 categories. All tools are available to any MCP-compatible agent or assistant.

Entity Resolution

Tool	When to use
`get_entities`	Start here. Resolve a service or database name to its ID; list all entities in a namespace; check current health status
`get_label_values`	Enumerate valid label values (team, product, cluster, namespace) before fanning out queries across environments
`list_namespaces`	Discover Kubernetes namespace names before resolving entities or scanning a namespace
`list_clusters`	Discover cluster names before scoping multi-cluster queries

Data Retrieval

Tool	When to use
`get_metrics`	Retrieve numeric metric data (p95 latency, error rate, CPU, memory, throughput): the only tool that returns time-series
`get_logs`	Inspect live service logs, or retrieve evidence logs captured at root cause detection time
`get_alerts`	Start triage from an alert name (PagerDuty, Slack, Datadog); distinguish alerts mapped to causal analysis from noise
`get_events`	Correlate symptom onset with deployments, restarts, scaling events, or config changes
`get_config`	Investigate configuration drift; verify deployment manifest matches expectations
`get_slow_queries`	Identify database queries consuming the most execution time; follow up on database root causes

Health & Diagnosis

Tool	When to use
`ask_causely`	Open-ended questions and synthesis: historical summaries, standup recaps, anything where narrative output is more useful than structured data
`get_symptoms`	Call with no filters to see all active symptoms across the entire environment or filter for specific entity, namespace or cluster
`get_root_causes`	Identify all active root causes; filter by impacted service, symptom, or root cause ID
`get_entity_health`	Structured health summary for non-Service entities (databases, pods, queues, topics, tables)
`get_environment_health`	Structured health summary for the environment, can be scoped to specific namespaces or services
`get_slo`	Check SLO state, error budget remaining, and burn rate
`get_topology`	Find upstream blast radius (dependents), downstream dependencies, or full data-flow graph
`get_integration_status`	Verify monitoring coverage; check scraper health by cluster
`triage`	Focused health summary by entity name or root cause ID: no entity ID pre-resolution needed
`team_health`	Health summary for all services owned by a team; degraded and critical services listed first
`get_service_summary`	Comprehensive health snapshot for a single service: status, symptoms, root causes, SLOs, metrics, events, logs

Reporting & Postmortems

Tool	When to use
`postmortem`	Generate a deterministic postmortem draft for a resolved incident from Causely data
`generate_ticket`	Create a structured engineering ticket suitable for Jira, GitHub Issues, or Linear

Reliability & Deployment

Tool	When to use
`reliability_delta`	Post-deploy regression check for a single service: compare resource consumption before/after most recent deployment
`fleet_reliability_delta`	Batch regression check across a team, namespace, or explicit service list (up to 20 services per call)

MCP Server Integration

Key Workflows

Incident Triage

Quick Service Health

Post-Deploy Validation

Post-Incident Reporting

Skills (Recommended)

Choose Your Client

Other MCP-compatible Clients

Authentication

Using the Tool Reference

Tool Selection: Ask Causely vs Structured Tools

What Agents Get vs Raw Telemetry

Full Tool Reference

Entity Resolution

Data Retrieval

Health & Diagnosis

Reporting & Postmortems

Reliability & Deployment

Feature Demos

Solving Slow Database Queries

Helm Chart Example

Key Workflows​

Incident Triage​

Quick Service Health​

Post-Deploy Validation​

Post-Incident Reporting​

Skills (Recommended)​

Choose Your Client​

Other MCP-compatible Clients​

Authentication​

Using the Tool Reference​

Tool Selection: Ask Causely vs Structured Tools​

What Agents Get vs Raw Telemetry​

Full Tool Reference​

Entity Resolution​

Data Retrieval​

Health & Diagnosis​

Reporting & Postmortems​

Reliability & Deployment​

Feature Demos​

Solving Slow Database Queries​

Helm Chart Example​

Key Workflows

Incident Triage

Quick Service Health

Post-Deploy Validation

Post-Incident Reporting

Skills (Recommended)

Choose Your Client

Other MCP-compatible Clients

Authentication

Using the Tool Reference

Tool Selection: Ask Causely vs Structured Tools

What Agents Get vs Raw Telemetry

Full Tool Reference

Entity Resolution

Data Retrieval

Health & Diagnosis

Reporting & Postmortems

Reliability & Deployment

Feature Demos

Solving Slow Database Queries

Helm Chart Example