Infrastructure

Servers or cloud resources to execute tasks, run applications, or perform calculations.

Overview

Compute Spec
Container
Controller
Disk
Network Endpoint
Node
VirtualMachine

Compute Spec

CPU Congested

Integration sources:Infrastructure Scraper

One or multiple containers in a workload are experiencing CPU congestion, leading to potential throttling. This occurs when the containers use more CPU resources than allocated, causing degraded performance, longer response times, or application crashes. CPU throttling occurs when a container exceeds its CPU quota as defined by Kubernetes or Docker.

Crash Failure

Integration sources:Infrastructure Scraper

One or multiple containers of a workload has crashed with a non-zero exit code, indicating abnormal termination. This disrupts the application's functionality, leading to downtime or degraded performance depending on how the workload is designed. The non-zero exit code signifies an error during the execution of the container's process.

Frequent Crash Failure

Integration sources:Infrastructure Scraper

One or multiple containers of a workload are frequently crashing with a non-zero exit code, indicating abnormal termination. This disrupts the application's functionality, leading to downtime or degraded performance depending on how the workload is designed.

Frequent Memory Failure

Integration sources:Infrastructure Scraper

The application frequently runs out of memory, leading to crashes, performance degradation, or instability. This affects the application's availability and can lead to downtime or poor user experience. The issue is likely due to inefficient memory usage, such as memory leaks, excessive data loading into memory, or improper garbage collection.

Memory Failure

Integration sources:Infrastructure Scraper

Containers running out of memory can lead to service crashes or degraded performance, resulting in errors for end users or failed service requests. This typically occurs when a container's allocated memory is insufficient for the workload it is handling, causing out-of-memory (OOM) errors and potential system instability.

Container

Ephemeral Storage Congested

Integration sources:Infrastructure Scraper

A container is experiencing ephemeral storage congestion when its ephemeral storage usage becomes critically high, leading to failures in operations that depend on temporary storage. This may be triggered by factors such as excessive logging, inadequate cleanup of temporary files, or unexpected bursts in data processing.

Ephemeral Storage Noisy Neighbor

Integration sources:Infrastructure Scraper

A container acting as a noisy neighbor consumes excessive ephemeral storage, resulting in abnormally high storage usage and contributing to node-level disk pressure that can trigger pod evictions. This issue arises when a container consistently uses more ephemeral storage than expected.

Memory Noisy Neighbor

Integration sources:Infrastructure Scraper

A container acting as a noisy neighbor consumes excessive memory, leading to abnormally high memory usage and contributing to node-level memory pressure that can trigger pod evictions. This issue occurs when a container consistently uses more memory than expected, which adversely impacts both the container and its hosting node.

Controller

FrequentPodEphemeralStorageEvictions

Integration sources:Infrastructure Scraper

A Kubernetes workload is experiencing frequent pod evictions due to ephemeral storage exhaustion. This disrupts application availability and performance, as pods are terminated when they exceed their allocated storage limits or when node-level storage is under pressure.

Image Pull Errors

Integration sources:Infrastructure Scraper

Kubernetes controllers may encounter image pull errors when they cannot download container images from a registry, causing Pods to fail in starting or remain in an ImagePullBackOff state. This disrupts the deployment of applications and can affect service availability.

Malfunction

Integration sources:Infrastructure Scraper

Multiple pods for a Kubernetes controller are in a "NotReady" state for an extended period, which can lead to service unavailability or degraded performance.

Disk

Congested

Integration sources:Infrastructure Scraper

The disk has reached full capacity, which prevents new data from being written and may cause applications to fail, especially those dependent on free disk space for logs, caching, or temporary files. This can also slow down or halt system operations if critical processes can no longer write to the disk.

Inode Usage Congested

Integration sources:Infrastructure Scraper

The disk is experiencing inode exhaustion, meaning the file system has run out of inodes (metadata structures for file storage), which prevents new files from being created even if there is free disk space. This often causes errors in applications attempting to create files and can disrupt services reliant on file storage.

IOPs Congested

Integration sources:Infrastructure Scraper

The disk is experiencing Read/Write Operations Per Second (IOPS) congestion, meaning that the total IOPS capacity is fully utilized. This causes slow performance for applications that rely on disk access, leading to delayed data processing, system lags, or even timeouts.

Read IOPs Congested

Integration sources:Infrastructure Scraper

The disk is experiencing Read Operations Per Second (IOPS) congestion, meaning that the total IOPS capacity is fully utilized. This causes slow performance for applications that rely on disk access, leading to delayed data processing, system lags, or even timeouts.

Read Throughput Congested

Integration sources:Infrastructure Scraper

The disk is experiencing congestion specifically in read throughput, which slows down data retrieval from the disk and can degrade the performance of applications reliant on high-speed data access.

Write IOPs Congested

Integration sources:Infrastructure Scraper

The disk is experiencing Write Operations Per Second (IOPS) congestion, meaning that the total IOPS capacity is fully utilized. This causes slow performance for applications that rely on disk access, leading to delayed data processing, system lags, or even timeouts.

Write Throughput Congested

Integration sources:Infrastructure Scraper

The disk is experiencing write throughput congestion, leading to slower data write speeds and affecting applications that require high-speed data recording. This issue can cause delays in data availability and reduced performance in write-intensive tasks.

Network Endpoint

Invalid Server Certificate

Integration sources:Service Communication

The network endpoint is serving an invalid server certificate, resulting in a high rate of client request errors due to certificate validation failures. This issue propagates further, increasing the overall request error rate across the system.

Node

Disk Pressure

Integration sources:Infrastructure Scraper

Disk pressure on a Kubernetes node indicates that the node's disk usage is high, potentially causing the eviction of pods, reduced performance, and the inability to schedule new pods. This affects application stability and the node's overall functionality. Disk pressure can arise from insufficient disk space, often caused by log accumulation, container images, temporary files, or application data.

Memory Pressure

Integration sources:Infrastructure Scraper

Memory pressure on a Kubernetes node occurs when available memory falls below critical levels, potentially causing the eviction of pods and instability for applications running on the node. This reduces the node's capacity to run workloads, potentially leading to service disruptions if insufficient resources are available across the cluster.

VirtualMachine

Conntrack Table Congested

Integration sources:Infrastructure Scraper

The conntrack table on a VM is congested, causing new network connections to fail. This typically results in connectivity issues for applications, degraded performance, or downtime for services dependent on network communication. The conntrack table is responsible for tracking active network connections and has a fixed size, which can be exhausted under high connection load.

CPU Congested

Integration sources:Infrastructure Scraper

A Virtual Machine (VM) experiencing CPU congestion can lead to sluggish application performance, delayed response times, or even timeout errors for users and processes. This typically indicates that the VM's CPU is overutilized, potentially due to high resource demands from applications or insufficient CPU allocation.

Disk Read IOPs Congested

Integration sources:Infrastructure Scraper

The total disk read IOPS for a cloud VM are congested because the VM has reached its maximum allowable IOPS limit. This results in throttling, which can slow application performance and lead to delays or errors in read-heavy workloads.

Disk Read Throughput Congested

Integration sources:Infrastructure Scraper

The total disk read throughput for a cloud VM is congested because the VM has reached its maximum allowable read bandwidth. This can lead to slower data transfer rates for read-intensive applications, causing delays in processing and reduced system performance.

Disk Total IOPs Congested

Integration sources:Infrastructure Scraper

The total disk IOPS for a cloud VM are congested because the VM has reached its maximum allowable IOPS limit. This results in throttling, which can slow application performance and lead to delays or errors in read/write-heavy workloads.

Disk Total Throughput Congested

Integration sources:Infrastructure Scraper

The total disk throughput for a cloud VM is congested because the VM has reached its maximum allowable bandwidth. This can lead to slower data transfer rates for read/write-intensive applications, causing delays in processing and reduced system performance.

Disk Write IOPs Congested

Integration sources:Infrastructure Scraper

The total disk write IOPS for a cloud VM are congested because the VM has reached its maximum allowable IOPS limit. This results in throttling, which can slow application performance and lead to delays or errors in write-heavy workloads.

Disk Write Throughput Congested

Integration sources:Infrastructure Scraper

The total disk write throughput for a cloud VM is congested because the VM has reached its maximum allowable write bandwidth. This can lead to slower data transfer rates for write-intensive applications, causing delays in processing and reduced system performance.

Memory Congested

Integration sources:Infrastructure Scraper

Memory congestion in a Virtual Machine (VM) leads to slow system performance, application crashes, or even VM instability as the system struggles to allocate memory for running processes. This typically results in frequent swapping or out-of-memory (OOM) errors, impacting applications and user operations.

SNAT Ports Congested

Integration sources:Infrastructure Scraper

The SNAT (Source Network Address Translation) ports on a virtual machine (VM) are congested, leading to outbound network connection failures or degraded performance for services relying on external APIs or resources. This issue primarily impacts VMs that need to establish multiple concurrent connections to the internet or external systems.

Overview​

Compute Spec​

CPU Congested​

Crash Failure​

Frequent Crash Failure​

Frequent Memory Failure​

Memory Failure​

Container​

Ephemeral Storage Congested​

Ephemeral Storage Noisy Neighbor​

Memory Noisy Neighbor​

Controller​

FrequentPodEphemeralStorageEvictions​

Image Pull Errors​

Malfunction​

Disk​

Congested​

Inode Usage Congested​

IOPs Congested​

Read IOPs Congested​

Read Throughput Congested​

Write IOPs Congested​

Write Throughput Congested​

Network Endpoint​

Invalid Server Certificate​

Node​

Disk Pressure​

Memory Pressure​

VirtualMachine​

Conntrack Table Congested​

CPU Congested​

Disk Read IOPs Congested​

Disk Read Throughput Congested​

Disk Total IOPs Congested​

Disk Total Throughput Congested​

Disk Write IOPs Congested​

Disk Write Throughput Congested​

Memory Congested​

SNAT Ports Congested​

Overview

Compute Spec

CPU Congested

Crash Failure

Frequent Crash Failure

Frequent Memory Failure

Memory Failure

Container

Ephemeral Storage Congested

Ephemeral Storage Noisy Neighbor

Memory Noisy Neighbor

Controller

FrequentPodEphemeralStorageEvictions

Image Pull Errors

Malfunction

Disk

Congested

Inode Usage Congested

IOPs Congested

Read IOPs Congested

Read Throughput Congested

Write IOPs Congested

Write Throughput Congested

Network Endpoint

Invalid Server Certificate

Node

Disk Pressure

Memory Pressure

VirtualMachine

Conntrack Table Congested

CPU Congested

Disk Read IOPs Congested

Disk Read Throughput Congested

Disk Total IOPs Congested

Disk Total Throughput Congested

Disk Write IOPs Congested

Disk Write Throughput Congested

Memory Congested

SNAT Ports Congested