Skip to main content

1.0.90

August 29, 2025

Version 1.0.90

Enhanced Impact Understanding with Blast Radius Analysis

Understanding the scope and potential impact of a root cause is critical for effective incident response. Causely now provides comprehensive Blast Radius Analysis that gives you a clearer understanding of both the current and potential impact of every root cause.

Key capabilities:

  • Current Impact Assessment: See exactly which services are currently affected by a root cause and how the issue is propagating through your system
  • Potential Impact Prediction: Understand which additional services could be impacted if the root cause persists or worsens
  • Risk Assessment: Evaluate the broader implications of each root cause to prioritize response efforts effectively
  • Visual Impact Mapping: Clear visualization of impact scope helps teams align on response priorities and resource allocation

This enhanced impact analysis ensures teams can make informed decisions about incident response priorities and resource allocation, leading to more effective incident management.

blast radius

Datadog Alert Auto-Mapping for Trusted Signal Integration

Causely now automatically maps Datadog monitors to corresponding service symptoms, enabling you to leverage your existing curated signals for enhanced root cause analysis.

For organizations that have invested in creating reliable Datadog alerts and monitors, this integration provides a powerful way to incorporate those trusted signals into Causely's causal reasoning engine.

Key benefits:

  • Leverage Existing Investments: Utilize your carefully curated Datadog monitors and alerts within Causely's causal analysis
  • Enhanced Signal Quality: Combine Causely's automatic discovery with your domain expertise encoded in Datadog alerts
  • Faster Root Cause Identification: Trusted signals accelerate the path from symptom detection to root cause identification
  • Seamless Integration: Automatic mapping requires no additional configuration—Causely intelligently correlates Datadog alerts with service symptoms

This capability is particularly valuable for teams who have developed sophisticated monitoring strategies in Datadog and want to enhance their root cause analysis capabilities.

Improved Dataflow Visualization and Topic Metrics

We've enhanced Causely's ability to understand and visualize complex asynchronous data flows across your distributed systems:

Enhanced Topic Metrics:

  • Comprehensive Topic Monitoring: Better visibility into message queue performance and throughput
  • Producer-Consumer Relationships: Clear mapping of data flow relationships between services through messaging systems
  • Queue Depth Analysis: Monitor queue depths and identify potential bottlenecks in asynchronous processing

Improved Flow Visualization:

  • Directional Flow Mapping: Enhanced visualization of data flow direction between topics, tables, and services
  • End-to-End Tracing: Follow data flows from producers through topics to consumers with improved clarity
  • Performance Correlation: Better correlation between data flow metrics and service performance impacts

These improvements provide deeper insights into how data moves through your systems, making it easier to identify bottlenecks and performance issues in event-driven architectures.

Aggregated Pod-Level Metrics

Causely now provides enhanced pod-level observability with aggregated metrics that give you better insights into container performance:

Enhanced Pod Visibility:

  • Aggregated Resource Metrics: Combined view of resource utilization across related pods
  • Node-Container Relationships: Improved tracking of how containers relate to their host nodes
  • Performance Correlation: Better correlation between individual pod performance and overall service health
  • Resource Optimization Insights: Clearer understanding of resource usage patterns to inform optimization decisions

This enhancement provides more granular visibility into your containerized workloads while maintaining the broader service-level context that's crucial for effective root cause analysis.

Location-Based Notification Routing

Organizations operating across multiple regions or locations can now route root cause notifications to different destinations based on where entities are located:

Key Features:

  • Geographic Routing: Automatically route notifications based on entity location or region
  • Multi-Region Operations: Support for organizations managing infrastructure across different states, countries, or data centers
  • Customizable Routing Rules: Configure notification destinations based on entity labels, namespaces, or other location indicators
  • Operational Efficiency: Ensure the right teams receive notifications for incidents in their operational domain

This capability is particularly valuable for organizations with distributed operations teams who need location-specific incident routing to ensure fast response times and appropriate escalation paths.

Did you know?

Even without an active root cause, you can drill down into a service and get a list of potential root causes that could be impacting it, and a list of symptoms across entities that could be indicative of those root causes. This enables you to assess risk and prioritize response efforts effectively.

potential root causes

Bug Fixes and Minor Improvements

This release includes numerous stability improvements and enhancements across the platform:

  • Database Performance: Improved database headline scanning for better performance and reliability
  • Entity Management: Enhanced handling of entity data request pathways with status tracking
  • Cloud SQL Stability: Improvements to cloud-sql proxy for more stable database connections
  • Memory Management: Enhanced memory noisy neighbor detection with additional condition checks
  • Notification System: Improved notification routing per mediator for better distribution
  • Network Metrics: Enhanced collection of basic network metrics for improved observability
  • Entity Relationships: Better management of node-container relationships in the mediator
  • eBPF Instrumentation: Upgraded to Beyla 2.5 for improved automatic instrumentation capabilities
  • Entity Configuration: New database table and APIs for enhanced entity configuration management
  • Manifestation Handling: Improved handling of manifestations for deleted entities
  • Redis Performance: Enhanced Redis queue performance for better system responsiveness

1.0.89

August 25, 2025

Version 1.0.89

New Features

Docker Host Installation Support

Causely now offers a new streamlined installation option for Docker hosts. This installation method enables telemetry collection and root-cause analysis for containerized services on any Docker-enabled host, complementing our existing deployment options.

Key capabilities:

  • eBPF-based telemetry collection for Docker containers
  • Root cause analysis for services running on Docker hosts
  • Easy setup with automated installation scripts
  • Support for privileged container deployment with host PID access

Learn more about setting up Causely on Docker hosts in our Docker Host Installation guide.

Multi-Database Support for MySQL and PostgreSQL

Causely now supports multiple MySQL and PostgreSQL databases using the same secret configuration. This allows you to monitor multiple databases with the same credentials, reducing the need for separate secret management.

Learn more about setting up multiple MySQL and PostgreSQL databases in our MySQL Configuration guide and PostgreSQL Configuration guide.

Did You Know?

Streamline incident response with CauselyBot webhook integration

Causely can automatically forward root cause notifications to your preferred collaboration tools through CauselyBot, our open source webhook service. CauselyBot receives authenticated payloads from Causely and intelligently routes them to platforms like Slack, Microsoft Teams, and OpsGenie.

Key capabilities include:

  • Smart Filtering: Configure custom filters based on severity, entity type, SLO impact, or root cause name to ensure teams only receive relevant notifications
  • Multiple Destinations: Route different types of incidents to appropriate teams—send critical database issues to the DBA team while routing general service degradations to the on-call engineers
  • Secure Authentication: Built-in bearer token validation ensures only authorized notifications reach your systems
  • Easy Deployment: Available as Docker containers or Helm charts for seamless integration into your existing infrastructure

This enables faster incident response by delivering actionable context directly to the tools your teams already use, reducing mean time to resolution and improving collaboration during critical incidents.

Bug Fixes and Minor Improvements

This release focuses on stability improvements and bug fixes across the platform:

  • Impact Graph Resilience: Improved impact graph calculation to handle missing services gracefully, resulting in faster loading of the impact graph UI.
  • Database Health Checks: Added comprehensive health checks for cloud-sql-proxy and ping-check database connections for improved monitoring.
  • Redis Span Enhancement: Improved Redis span collection for better distributed tracing coverage.
  • Service Impact Calculation: Improved accuracy of service impact calculations to address scenarios where impact was too broad.
  • Entity Deletion Handling: Improved logic for handling deleted entities in the system. For example clusters or namespaces that have been removed, will no longer be shown in the topology.
  • Copilot Improvements: Various enhancements to the Causely Copilot functionality.

1.0.88

August 11, 2025

Version 1.0.88

Root Cause View—Sort Historical Root Causes by Symptom Count and Duration

Understanding what happened in past incidents is key to preventing them in the future and conducting more effective postmortem analyses. We've made this easier with new sorting options in the Historical Root Causes view:

  • Symptom Count: Focus on root causes that triggered a significant number of symptoms. These often represent issues with a large blast radius that, if they recur, can have substantial impact on the environment.
  • Duration: Pinpoint issues that have been degrading performance for a significant time and need to be addressed to restore stability.

For clarity, the Symptoms, Services Degraded, Duration, Start, and End columns now reflect the occurrence that matches your selected sort order. For example, if you sort by Symptoms, you'll see the occurrence with the highest symptom count; if you sort by Duration, you'll see the occurrence with the longest duration, and so on.

1.0.85

July 29, 2025

Version 1.0.85

Faster Recovery with Resource Contention Remediation

You can now remediate resource contention issues directly from the Causely UI. This helps you resolve incidents faster and restore service performance without breaking your flow.

Supported Root Causes:

The remediation interface provides step-by-step guidance and Kubernetes configuration examples, making it easy to implement fixes with confidence.

remediate now interface

Smarter Urgency Detection for Root Causes

We've improved how Causely flags urgent issues. Root causes that lead to SLO violations, or put your SLOs at risk, are now automatically marked as Urgent and sent to your alerting channels (for example, Slack) by default. Stay focused on what matters most.

This enhancement ensures that critical issues requiring immediate attention are automatically prioritized and routed to the right teams, reducing response times and improving incident management workflows.

Customizable SLO Settings

You can now fine-tune SLO targets and burn rate thresholds within Causely to align with your team's reliability goals. This helps improve the precision of urgency detection and alerting.

Key Features:

  • Configure custom SLO targets for different services
  • Set burn rate thresholds to control alert sensitivity
  • Align reliability goals with business requirements
  • Improve alert precision and reduce false positives

This feature is currently in preview. Check out our SLO Configuration documentation for detailed setup instructions.

Did you know?

Causely supports multiple sources for tracing out of the box. While our agents come with eBPF-based automatic instrumentation out of the box, you can also use Odigos, groundcover, Grafana Beyla, or existing OpenTelemetry data.

Additional service dependencies can be identified from Datadog or Dynatrace data, giving you flexibility to work with your existing observability stack.

Bug Fixes and Minor Improvements

  • Load Balancer Support: Added support for Google Cloud internal load balancers, enabling better visibility into private service endpoints.
  • Controller Discovery: Improved discovery of custom controllers such as GitLab Runner to enhance coverage of user-defined Kubernetes workloads.

1.0.84

July 21, 2025

Version 1.0.84

Root Cause Impact—Now in Architectural Context

Quickly see which services are impacted by a root cause. Causely now provides a direct view of the services being affected, along with a link to the live service-to-service topology graph.

This helps teams align on the scope of the issue, prioritize response, and collaborate on remediation—all from a shared system view.

Easier Alerting Integration Setup and Testing

Streamline your incident response workflows. We now support in-product testing for alerting integrations (like Slack, PagerDuty, and more), so you can confirm everything works before going live.

Causely supports multiple workflow destinations—ensuring your on-call engineers get timely, actionable context in the tools they already use.

alerting integration testing

Fine-Tuned Symptom Activation—Because Every Second Counts

Control how fast (or slow) Causely infers a root cause. You can now customize the activation and deactivation delays for symptoms on a per-service basis using Kubernetes labels, Nomad service tags, or Consul service metadata.

By default, a condition must persist for 10 minutes before a symptom becomes active—but now you can tighten that window for mission-critical services or extend it for lower-priority ones. See our symptom delay configuration guide for detailed setup instructions and best practices.

This helps you tune responsiveness without sacrificing signal quality.

Understand Past Incidents Faster

Explore historical root causes with more precision. You can now sort by services degraded count, start time, or end time to quickly find the root causes that matter most from previous days or weeks.

Whether you're doing a postmortem or scanning for recurring issues, it's now much easier to answer: "What was going on at that time?"

historical root cause filtering

Did you know?

Causely uses eBPF technology to automatically instrument your applications with zero code changes and minimal overhead. Powered by Grafana Beyla, our eBPF-based instrumentation extracts rich telemetry data from services written in Go, Java, .NET, NodeJS, Python, Ruby, Rust, and more—without requiring language-specific agents or application modifications.

This zero-effort integration provides actionable insights into service interactions, latencies, and system performance, and it's enabled by default in all Causely deployments.

Learn more about how Causely leverages eBPF for automatic instrumentation.

1.0.83

July 1, 2025

Version 1.0.83

Improved Root Cause Views

We've redesigned our Root Cause views to help you quickly identify and address the most urgent service-impacting issues. The new interface prioritizes critical root causes based on their impact scope and severity, making incident triage more efficient than ever.

API Documentation Now Available

Our comprehensive GraphQL API documentation is now available! Programmatically access Causely's root cause analysis engine, query defects, and integrate with your CI/CD pipelines.

Smarter Post-Deployment Analysis

Causely now provides enhanced visibility into code change-related root causes with improved clarity around version change events:

  • Immediate Detection: Catch resource usage changes right after a deployment to identify regressions faster
  • Precise Correlation: Version timestamps are now directly correlated with resource metrics like CPU, memory, and latency
  • Before vs. After Insights: Get proactive visibility into what changed pre and post-deployment
  • Automatic Code Change Attribution: Causely automatically infers if a root cause stems from a code change, helping teams quickly connect symptoms to recent deployments

This feature is particularly valuable for understanding the real performance impact of new releases on your services.

post-deployment analysis

Better Visibility Into Asynchronous Data Flows

Understanding complex message flows across your distributed system just got easier with our improved data flow graphs:

  • End-to-End Tracing: Follow messages from publish to RPC method or HTTP path, even across multiple service hops
  • Topic Filtering: Isolate behavior for specific customers or queues by filtering data flows by topic
  • Causal Integration: These improvements are fully integrated into our causal engine, enhancing root cause accuracy for asynchronous systems
data flow

Did you know?

Scopes in Causely allow you to define and manage custom subsets of your environment's topology. As Causely automatically discovers the full topology of your environment, it can present a rich but potentially overwhelming set of entities—services, infrastructure components, and identified problems. Scopes help you focus on the specific subset of data that matters most to your role, responsibilities, or current investigative tasks.

Learn more about scopes and how to use them in our documentation.

Bug Fixes and Minor Improvements

  • Database Performance: Added indices to active actions and increased max DB pool connections for better performance
  • Slack Integration: Enhanced support for users in multiple Slack teams
  • Kubernetes Improvements: Added option to disable entity log collection from Kubernetes and improved pod metadata for network endpoints
  • Alert Management: Added alerts as context in symptoms and implemented continuous sender for alert manager notifications
  • Log Management: Optimized log storage by limiting to 1000 log lines per evidence
  • UI Improvements: Fixed time duration format display and headline database scanning
  • Root Cause Quality: Filtered out root causes with only low probability symptoms and fixed flapping issues due to equal probabilities
  • Monitoring Enhancements: Added support for symptom monitoring from Prometheus and improved topology scraper metrics for SLOs
  • SQL Parsing: Enhanced SQL query parsing for better database performance analysis
  • Email Notifications: Improved weekly email summary delivery system

1.0.81

June 2025

Version 1.0.81

Ask Causely: SQL Query Analysis & Service Graphs

Ask Causely has leveled up. This release also comes with additions to answering questions about incidents and metrics, it now helps you to analyze and troubleshoot your slowest SQL queries and provides automatic visualization for query topology and service graphs.

Want to enable Ask Causely? Reach out to your Causely team to activate early access.

Ask Causely

Grafana Plugin is Now Publicly Available

We're excited to officially launch the Causely plugin for Grafana! Now anyone can bring Causely's root cause analysis directly into their existing dashboards with no extra context-switching required!

Latest Highlights:

  • Root Cause Panel: See urgent issues and causal summaries in place.
  • Root Cause Headlines: Clear, contextual insights without leaving Grafana.
  • Unhealthy Services Panel: Quickly identify degraded services and their downstream impact at a glance.

This marks a major step in bringing Causely's causal intelligence to where your teams already work. With the new plugin, Grafana becomes not just a place for metrics, but a place for action. We'll continue to expand the plugin with even more capabilities in upcoming releases.

grafana plugin

Enriching Root Cause Analysis with Trusted Signals from Prometheus, Alertmanager, and Checkly

Causely now ingests alerts from Prometheus, Alertmanager, and Checkly—bringing the signals you already trust into our causal engine. These alerts are automatically mapped to known symptoms in your environment, giving incidents immediate structure, historical continuity, and causal depth.

  • Prometheus + Alertmanager: Pull alerts in real-time and map them to symptoms in your knowledge graph—enhancing situational awareness and accelerating investigations.
  • Checkly: API check failures are now linked to the services they impact and surfaced as active symptoms, giving synthetic monitoring real operational context.
  • Symptom Activation/Deactivation: Alerts can now directly toggle symptom states, powering more dynamic, accurate, and automated RCA workflows.

With these integrations, Causely doesn't just observe alerts—it interprets them in context. You're extending your causal graph with trusted telemetry, making every incident easier to understand, triage, and resolve.

Did you know?

Causely automatically discovers and visualizes your service dependencies—no manual config required. We analyze runtime communication patterns (HTTP, gRPC, SQL, Kafka, etc.) to give you a live, layered map of your architecture—from services to infra to messaging layers—enabling more accurate root cause analysis and impact prediction.

Bug Fixes and Minor Improvements

  • Weekly Email Summaries: Automatically receive a weekly summary of incidents and RCA results to stay aligned.

1.0.79

May 2025

Version 1.0.79

New Landing Page: See What's Most Interesting in Your Environment

The new Causely landing page gives you a high-signal, low-noise view of your Root Cause Headlines in your environment over the last 24 hours.

Whether it's a sudden spike in latency, a critical root cause affecting your services, or key SLO risks, we now highlight it the moment you log in.

This helps teams prioritize actions based on impact.

new landing page

Ask Causely: Your Incident Copilot in Causely and Slack (Early Access)

Introducing Ask Causely, your LLM-powered assistant built for real-time operational insight. Whether you're in Slack or in the Causely UI, Ask Causely helps you resolve incidents faster and improve service health.

What it can do:

  • Respond to users' natural language questions, like "Which services are currently impacted by active root causes?"
  • Get context-aware answers: **Root cause + symptoms + suggested next steps **
  • Integrated into both Slack and Causely Web UI

Want to enable Ask Causely? Reach out to your Causely team to activate early access.

ask causely

Simplified Navigation: Focused on What Matters Most

We've rethought the structure of Causely's interface to spotlight our core value: real-time, automated analysis of what's causing service latency and errors.

What's improved:

  • Streamlined layout with **fewer distractions **
  • Root causes now front and center
  • A new getting-started checklist to help you activate value faster

This refined navigation ensures that your attention goes straight to high-impact issues.

Bug Fixes and Minor Improvements

  • Smarter Root Cause Alerts : We now notify you only for root causes that impact multiple services beyond just SLO violations—reducing noise and helping you prioritize real incidents.
  • Refined Symptom Deactivation Logic: Error symptoms are now tied to real request activity, preventing premature deactivation or activation in idle services.
  • Per-Service Thresholds: Teams can now configure latency and error thresholds for individual services, replacing the default ML-based learned thresholds. This allows for more fine-grained alert tuning and better alignment with service-specific expectations.
  • Splunk OnCall Integration: Causely now supports Splunk OnCall notifications, expanding your ops toolchain with automated incident routing.
  • AWS Discovery & Metrics Improvements: We've added pagination for AWS ALB discovery, improved tag handling, and now set ALB latency directly from observed values for faster, more accurate symptom detection.