Kubernetes
Overview
Causely provides native, out-of-the-box integration with Kubernetes that automatically discovers and monitors your entire container infrastructure. When you install Causely using our standard Helm or CLI installation, Kubernetes monitoring is enabled by default with zero configuration required.
The integration provides comprehensive visibility across your entire Kubernetes stack, from cluster-level resources down to individual containers, helping you identify infrastructure issues before they impact your applications.
How It Works
Causely's Kubernetes integration works automatically upon installation:
- Automatic Discovery: The agent automatically discovers all Kubernetes resources in your cluster using the Kubernetes API
- Real-time Monitoring: Continuously monitors resource states, events, and relationships
- Entity Modeling: Creates a comprehensive topology graph showing relationships between clusters, nodes, pods, services, and applications
- Event Processing: Analyzes Kubernetes events to detect issues like pod evictions, scheduling problems, and resource constraints
- Root Cause Analysis: Correlates infrastructure issues with application performance problems
- Auto-remediation: Remediate issues automatically by deploying fixes or scaling resources
This approach provides complete infrastructure visibility without requiring any configuration changes or additional setup beyond the standard Causely installation.
Out-of-the-Box Setup
Kubernetes monitoring is enabled by default when you install Causely. Simply follow our standard installation guide:
📦 Install with Helm or 💻 Install with CLI
No additional configuration, secrets, or permissions are required beyond what's included in the standard installation.
What You Get
Infrastructure Topology
- Complete service map showing relationships between applications, services, and infrastructure
- Multi-layer visualization from business applications down to individual containers
- Dependency tracking across namespaces and resource types
Workload Monitoring
- Controller analysis for Deployments, StatefulSets, DaemonSets, Jobs, and CronJobs
- Pod lifecycle tracking including scheduling, running, and termination states
- Container health monitoring with resource usage and state information
- Configuration change detection for container images, resources, environment variables, and volume mounts
Resource Management
- Node health monitoring with conditions like memory pressure, disk pressure, and network availability
- Persistent volume tracking with usage and binding information
- Service discovery with automatic endpoint creation and load balancer mapping
- Ingress routing analysis for external traffic patterns
Event Analysis
- Pod eviction detection for memory pressure, disk pressure, and resource constraints
- Scheduling failure analysis for unschedulable pods
- Image pull error tracking for deployment issues
- Configuration change events for version updates and resource modifications
Application Integration
- Service-to-pod mapping for application relationship discovery
- Network endpoint creation for service communication analysis
- Load balancer and ingress integration for external access patterns
- Kafka resource discovery for message queue topology (if using Strimzi operator)
Root Cause Detection
The Kubernetes integration enables detection of infrastructure-related root causes including:
Node-Level Issues
- Disk Pressure - Node disk usage triggering pod evictions
- Memory Pressure - Node memory exhaustion causing pod evictions
Controller and Workload Issues
- Controller Malfunction - Multiple pods in NotReady state
- Image Pull Errors - Pods failing to start due to registry issues
- FrequentPodEphemeralStorageEvictions - Pods evicted due to storage limits
Container Resource Issues
- CPU Congested - Container CPU throttling and performance degradation
- Memory Failure - Container out-of-memory kills
- Frequent Memory Failure - Repeated memory-related crashes
- Crash Failure - Container crashes with non-zero exit codes
- Frequent Crash Failure - Repeated container crashes
Storage and Noisy Neighbor Issues
- Ephemeral Storage Congested - Container storage usage causing failures
- Ephemeral Storage Noisy Neighbor - Container consuming excessive storage affecting node
- Memory Noisy Neighbor - Container consuming excessive memory affecting node
- Disk Congested - Persistent volumes reaching capacity limits
Service-Level Issues
- Service Congested - Kubernetes services experiencing high latency
- Service Malfunction - Kubernetes services with high error rates
Release-Related Issues
- CPU Congested Caused By Code Changes - Performance degradation after deployments
- Memory Failure Caused By Code Changes - Memory issues introduced by new versions
What Data is Collected
The Kubernetes integration automatically collects comprehensive metadata and state information, including:
Cluster-Level Resources
- Cluster identity and configuration
- Node specifications and health conditions
- Namespace organization and resource quotas
- Custom resource definitions and operators
Workload Resources
- Pod specifications including containers, volumes, and resource requirements
- Controller configurations for Deployments, StatefulSets, DaemonSets, Jobs, and CronJobs
- Service definitions and endpoint mappings
- Ingress rules and traffic routing configuration
Runtime Information
- Container states and restart counts
- Resource utilization and capacity limits
- Event logs for troubleshooting and analysis
- Configuration changes and version history
Relationships and Dependencies
- Service-to-pod mappings for application topology
- Pod-to-node assignments for infrastructure placement
- Volume bindings and storage relationships
- Network connectivity patterns and endpoints
The Kubernetes integration respects your cluster's RBAC policies and only accesses resources that the Causely service account has permissions to read. All data collection uses read-only Kubernetes API calls.