Skip to main content

Automate Remediation for Resource Contention

Causely allows you to automatically remediate resource contention issues directly from the UI, helping you restore performance faster, reduce time to resolve, and keep services within SLOs.

When Causely identifies a deterministic Resource Contention root cause, you can trigger automated remediation or apply a guided fix with one click.

Supported Root Causes

Causely supports automated remediation for the following resource-related root causes:

CPU Congested

Automatically adjust CPU limits when services are experiencing CPU saturation.

Frequent Memory Failure

Resolve persistent out-of-memory issues caused by memory leaks or inefficient usage.

Memory Failure

Increase memory allocations to resolve out-of-memory issues.

Ephemeral Storage Noisy Neighbor

Isolate and manage containers that excessively consume ephemeral storage, impacting node stability.

Memory Noisy Neighbor

Isolate and manage containers that excessively consume memory, impacting node stability.

Congested Services

Scale service resources to handle increased load.

What Causely Changes Automatically

When remediation is executed for a supported Resource Contention root cause, Causely applies a deterministic scaling action based on the type of bottleneck identified:

  1. Vertical Scaling (+50%): Increases the affected container’s CPU or memory requests and limits by 50%.
  2. Horizontal Scaling (+1 Replica): Adds one additional replica to the deployment to immediately increase capacity.

These adjustments are purposefully conservative and are only applied when Causely’s causal reasoning model confirms that scaling, rather than a correlated symptom, is the correct fix.

Enabling Automated Remediation (Executor Required)

Automated actions require the executor to be enabled on the mediator running in the cluster where you want remediation performed.

To enable the executor, update your mediator’s causely-values.yaml:

executor:
enabled: true

Once updated, redeploy or upgrade your mediator so it loads the new configuration.

See using custom values file for details on applying updated values.

Using the Remediate Now Interface

In the UI, supported RCs include a Remediate now option that provides:

remediate now interface
  • An acknowledgment step showing the impacted deployment
  • Auditable action history tied to the entity for which value were updated

If you prefer to apply the change manually, the Remediation section for the root cause includes YAML examples you can use.

remediation example

Aligning Configuration with MCP Server

These remediation updates are applied at runtime, and the MCP Server provides a way to commit these configuration adjustments into your codebase for long‑term consistency.

If you want to standardize or persist updated sizing, you can manage configuration through the MCP Server.