Skip to main content

Services

Services are self-contained units of functionality within a system that perform specific tasks or provide specific capabilities, often accessible through defined interfaces or APIs. Service can be either internal or third-party external services beyond the core application and infrastructure.

Root Causes

Service

Application Load Balancer


Service

Congested

The service is experiencing congestion, resulting in high latency for clients. This suggests that the system is unable to handle the current load efficiently, causing delays in response times.
Congestion often occurs when the service receives more requests than it can handle within its capacity, leading to bottlenecks in processing. This may be due to insufficient resources (for example, CPU, memory, or bandwidth), unoptimized code, or a surge in traffic (for example, due to a sudden increase in demand or DDoS attack).


Malfunction

The Service is experiencing a high rate of errors, causing disruptions for clients. This can lead to degraded performance, failed requests, or complete service unavailability, significantly affecting the user experience.


Application Load Balancer

Authentication Misconfiguration

AWS Application Load Balancer (ALB) authentication misconfiguration can disrupt secure traffic routing and lead to widespread configuration issues. This misconfiguration may trigger elevated ELB authentication errors, 504 request timeouts, and target connection errors, ultimately impacting service availability and application performance.
As a result, this issue often cascades into broader configuration problems and triggers the following symptoms:

  • ELBAuthError.High: A high frequency of authentication errors reported at the load balancer level.
  • Request504Error.High: Increased gateway timeout errors due to delays or failures in processing authentication requests.
  • TargetConnectionError.High: Elevated instances of backend targets failing to establish or maintain connections as expected.

Idle Timeout Misconfiguration

Misconfigured idle timeout settings can lead to unintended connection drops and delays, potentially triggering a high frequency of 504 gateway timeout errors. This misconfiguration may also contribute to broader configuration issues that disrupt seamless connectivity between clients and servers.
Idle timeout misconfigurations typically occur when the duration set for idle connections does not match the application's requirements. As a consequence, the following issues are often observed:

  • Misconfiguration Problem: Broader configuration errors affecting overall system performance.
  • Request504Error.High: An increased rate of 504 gateway timeout errors due to idle connections being terminated before requests are fully processed.

Misconfiguration

AWS Application Load Balancer misconfiguration can cause widespread connectivity issues, leading to a high frequency of 504 gateway timeout errors and 5xx server errors. These issues indicate that the load balancer's settings are not properly optimized for handling traffic efficiently and reliably.
Misconfiguration in an AWS Application Load Balancer occurs when key settings do not align with the application's traffic patterns and operational requirements. This may involve:

  • Timeout Settings: Inaccurate idle or connection timeout values that cause active connections to be dropped prematurely, resulting in 504 gateway timeout errors.
  • Network Configuration: Overly restrictive or improperly defined network policies (such as security groups or network ACLs) that block or delay legitimate traffic, leading to increased 5xx errors.
  • Authentication Parameters: Incorrect authentication configurations that interfere with proper request processing and further contribute to error propagation.

These configuration issues can lead to a propagation of failure, significantly increasing error rates and degrading overall service performance.


Network Policy Misconfiguration

AWS Application Load Balancer (ALB) network policy misconfigurations can block or restrict legitimate traffic, leading to widespread configuration issues. These misconfigurations often result in elevated 504 gateway timeout errors and target connection failures, ultimately disrupting service availability.
Network policy misconfigurations typically occur when the rules governing allowed traffic to and from the ALB are improperly defined. As a consequence, the following issues are often observed:

  • Misconfiguration Problem: Broader configuration errors affecting the overall system.
  • Request504Error.High: A high rate of gateway timeout errors due to delayed or blocked request processing.
  • TargetConnectionError.High: Increased instances where backend targets are unable to establish or maintain connections because of network restrictions.