v1.0.126
Generic Application Error Log Detection
Causely now detects generic application errors in logs and incorporates them into the causal model via a new Application Error Spike symptom and root cause. Previously, log-based root cause detection relied on predefined error patterns; errors that did not match a known pattern were not surfaced. When Causely observes more than 2,000 generic errors within a one-hour window, it activates the application error log symptom, which is then used in continuous root cause analysis to help agents and on-call engineers identify and resolve the underlying issue faster. Note that the threshold for error log symptom activation is configurable, see more detail in threshold configuration.
Learn more about log-based root causes
Token Management and Developer Role
Causely now supports rotating and revoking the tokens used with the Causely mediator, and introduces a new Developer role. Developers can add new mediators and manage tokens for those mediators; Administrator roles retain the ability to manage tokens across all mediators in a tenant. Read Only users cannot add mediators or manage tokens.
Minor Improvements
- AWS RDS database cluster support: Causely now identifies both the overall RDS cluster and its individual databases, improving visibility into database-layer root causes.
- Symptom activation timing: Improved the symptom activation delay logic to better distinguish bursty anomalies from sustained ones. Learn more
- Generic webhook notifications: Improved support for sending Causely notifications to generic webhooks. Learn more
- Agent-not-deployed false positives: Addressed incorrect "Causely agent is down" symptom activation for nodes where an agent was intentionally not deployed.
- Elasticsearch integration: Addressed an issue affecting the Elasticsearch integration.
- Service logs tab: Clarified that errors and warnings shown in the Service logs tab reflect the last 5 minutes of activity.
- Node malfunction detection: Improved detection of the node malfunction root cause to correctly handle scenarios where nodes are automatically removed by autoscaling mechanisms.