MCP Server Integration
The Causely MCP server gives agents and AI assistants direct access to Causely's causal reasoning engine. 29 tools across 5 categories let your agent move from raw alerts to structured root cause analysis, dependency maps, and reliability reports, without writing custom integrations.
Key Workflows
These are the four workflows agents use most often. Each maps to a specific sequence of MCP tool calls.
Incident Triage
Identify what's broken and how far it has spread.
Try:
- "What's broken in production right now?"
- "Checkout is throwing errors and we have Alertmanager alerts firing. What's the actual root cause?"
- "Three services are alerting at once. Which is the real problem and which are downstream noise?"
get_symptoms(): see all active symptoms across the entire environment (no filters needed)get_root_causes(): identify all active root causes and impacted servicesget_alerts(alert_name_filters=...,): drill into a specific alert's causeget_topology(entity_id=..., mode="dependents"): map which upstream services are affected
Handled automatically by causely-correlated-incidents, or causely-alert-triage if you're starting from a specific alert.
Quick Service Health
Get a complete health picture for a specific service in two calls.
Try:
- "Is checkout healthy?"
- "Give me a full health picture for the payments service: status, open issues, SLOs."
- "Before I page anyone, is there actually a problem with database-service or is this alert noise?"
- "Are there are any concerning errors or warnings in the logs for the frontend service over the last hour?"
get_entities(query="service-name", entity_types=["Service"]): resolve the service name to its entity IDget_service_summary(service="service-name"): full snapshot: status, active symptoms, root causes, SLOs, metrics, recent events, error logsget_logs(entity_id=...,): retrieves live log output for a running service
Handled automatically by the causely-health-reporting skill.
Post-Deploy Validation
Check whether a deployment introduced regressions.
Try:
- "Did the last deploy to payments cause any regressions?"
- "We deployed cart service 30 minutes ago. How does it look compared to before?"
- "Check all services my team owns, did anything degrade after today's deploys?"
reliability_delta(service="service-name"): compare CPU, memory, latency, and error rate before vs after the most recent deploymentfleet_reliability_delta(team="team-name"): batch check across all services for a team, namespace, or explicit list
Handled automatically by the causely-change-impact skill.
Post-Incident Reporting
Generate postmortem documentation and action items from a resolved incident.
Try:
- "The payments outage is resolved. Draft a postmortem."
- "Write up what happened to checkout this morning: timeline, root cause, and what was affected."
- "Payments is back up. Draft the postmortem and create a follow-up ticket for the team."
get_root_causes(root_cause_id=...): retrieve full root cause details, timeline, and blast radiuspostmortem(root_cause_id=...): generate a structured postmortem draftgenerate_ticket(task="..."): create a follow-up engineering ticket for Jira, GitHub Issues, or Linear
Handled automatically by the causely-postmortem skill.
Skills (Recommended)
Skills automate the tool-selection step shown in the workflows above. You describe your situation in natural language; the right specialist activates and runs the correct tool sequence for you. Skills are available for Claude Code, Claude Desktop, and Cursor.
| Situation | Skill | Try |
|---|---|---|
| Incoming alert | causely-alert-triage | "PagerDuty just paged for checkout-latency. What's the actual cause?" |
| Post-deploy validation | causely-change-impact | "Did the last deploy to payments cause any regressions?" |
| Multi-service outage | causely-correlated-incidents | "Three services are alerting at once. What's the real problem?" |
| Health summary / morning standup | causely-health-reporting | "Give me a morning health report for production." |
| Kubernetes investigation | causely-k8s-investigation | "The orders pod keeps OOMKilling. Why?" |
| Postmortem / ticket | causely-postmortem | "Draft a postmortem for the checkout outage that resolved an hour ago." |
See the Skills page for install instructions, full skill detail, and override options.
Choose Your Client
Select your tool for a copy-paste config snippet, config file location, and restart instructions.
| Client | Transport | Config format |
|---|---|---|
| Claude Code | HTTP | .mcp.json (mcpServers) |
| Claude Desktop | stdio via mcp-remote | claude_desktop_config.json (mcpServers) |
| Codex | HTTP | config.toml (mcp_servers) |
| Cursor | HTTP | .cursor/mcp.json (mcpServers) |
| VS Code (GitHub Copilot) | HTTP | .vscode/mcp.json (servers) |
Verify your connection by asking: ”Causely: What defects are currently active?”
Other MCP-compatible Clients
The clients above have dedicated setup pages. The following tools also support the Causely MCP server, point them at https://api.causely.app/mcp using your tool's HTTP MCP config. See Advanced Authentication for credential options.
IDEs and Editors: JetBrains IDEs (IntelliJ IDEA, PyCharm, WebStorm, GoLand, and others), Windsurf, Zed
CLIs: Kiro CLI, Amp, Atlassian Rovo DEV CLI, and other MCP-compatible CLI tools
Agent Frameworks: HolmesGPT
Authentication
The MCP server validates Frontegg-issued Bearer tokens. For most clients, browser-based OAuth runs automatically, no manual setup needed. For non-interactive setups (automation, CI) or clients that only support stdio, including the stdio/mcp-remote fallback, see Advanced Authentication.
Using the Tool Reference
The reference below is for teams building custom agents that need explicit tool control. If you're using Claude, Cursor, Codex, or any conversational agent, you can skim it for capability awareness. In most cases, you can describe what you want and the agent picks the right tools. One thing worth knowing if you do go programmatic: most structured tools require an entity ID, so get_entities() is usually the right first call.
What Agents Get vs Raw Telemetry
| Raw telemetry | Causely MCP | |
|---|---|---|
| Root cause identification | Correlation-based, requires analysis | Deterministic causal analysis |
| Dependency awareness | Manual mapping required | Live topology from observed traffic |
| Blast radius | Estimated | Computed from causal graph |
| Structured output | Custom parsing required | Typed tool responses |
| Time to insight | Minutes of analysis | Single tool call |
Full Tool Reference
29 tools across 5 categories. All tools are available to any MCP-compatible agent or assistant.
Entity Resolution
| Tool | When to use |
|---|---|
get_entities | Start here. Resolve a service or database name to its ID; list all entities in a namespace; check current health status |
name_lookup | Resolve any name, including service, cluster, namespace, root cause name, or symptom name, to an entity ID for use in other tools |
get_label_values | Enumerate valid label values (team, product, cluster, namespace) before fanning out queries across environments |
Data Retrieval
| Tool | When to use |
|---|---|
get_metrics | Retrieve numeric metric data (p95 latency, error rate, CPU, memory, throughput): the only tool that returns time-series |
get_logs | Inspect live service logs, or retrieve evidence logs captured at root cause detection time |
get_alerts | Start triage from an alert name (PagerDuty, Slack, Datadog); distinguish alerts mapped to causal analysis from noise |
get_events | Correlate symptom onset with deployments, restarts, scaling events, or config changes |
get_config | Investigate configuration drift; verify deployment manifest matches expectations |
get_slow_queries | Identify database queries consuming the most execution time; follow up on database root causes |
Health & Diagnosis
| Tool | When to use |
|---|---|
get_symptoms | Call with no filters to see all active symptoms across the entire environment or filter for specific entity, namespace or cluster |
get_root_causes | Identify active root causes; filter by impacted service, symptom, root cause ID, or a start/end date range; use start/end dates when investigating a specific past time window |
get_entity_health | Structured health summary for non-Service entities (databases, pods, queues, topics, tables) |
get_environment_health | Structured health summary for the environment, can be scoped to specific namespaces or services |
get_slo | Check SLO state, error budget remaining, and burn rate |
get_topology | Find upstream blast radius (dependents), downstream dependencies, or full data-flow graph |
get_integration_status | Verify monitoring coverage; check scraper health by cluster |
get_incident_impact | Given a root cause ID (or an entity ID + root cause name), returns the responsible service and its business context, plus all impacted services and their business context |
team_health | Health summary for all services owned by a team; degraded and critical services listed first |
get_service_summary | Comprehensive health snapshot for a single service: status, symptoms, root causes, SLOs, metrics, events, logs |
investigate_alert | Investigate a resolved alert from get_alerts; maps the alert to its entity and returns the standard get_entity_health result alongside the original alert |
rank_entities | Rank services, topics, tables, or endpoints by number of dependencies or dependents, a single bulk query instead of looping get_topology |
get_potential_diagnoses | Active and model-inferred diagnosis hypotheses for a specific entity; includes inactive and causality-only potentials not returned by get_root_causes |
get_potential_observable_signals | All observable signals on a specific entity (active, inactive, and causality potential state); use before get_signal_potential_diagnoses to find internal signal names |
get_signal_potential_diagnoses | Reverse lookup: given a symptom, event, or SLO on an entity, return the diagnosis from the causality model that could explain it |
get_diagnosis_observable_signals | Retrieve the theoretical causality chain for a diagnosis, including downstream symptoms, events, and SLOs it could cause according to the causal model; compare to observed signals from get_symptoms |
Reporting & Postmortems
| Tool | When to use |
|---|---|
postmortem | Generate a deterministic postmortem draft for a resolved incident from Causely data |
generate_ticket | Create a structured engineering ticket suitable for Jira, GitHub Issues, or Linear |
Reliability & Deployment
| Tool | When to use |
|---|---|
reliability_delta | Post-deploy regression check for a single service: compare resource consumption before/after most recent deployment |
fleet_reliability_delta | Batch regression check across a team, namespace, or explicit service list (up to 20 services per call) |