Understanding Kubernetes Resource Requests and Limits
You've deployed your first Kubernetes cluster, and everything seems to work. Pods start, containers run, and your application responds to requests. But then you notice something strange: your cluster has a few nodes that are constantly at 100% CPU while others sit idle. Or maybe your application suddenly becomes unresponsive during traffic spikes. These are classic symptoms of misconfigured resource requests and limits.
Kubernetes resource requests and limits are fundamental concepts that determine how your applications behave in production. They control CPU and memory allocation, affect pod scheduling decisions, and directly impact your cluster's efficiency and cost. Get them wrong, and you'll face unpredictable performance issues or wasted resources. Get them right, and your cluster runs smoothly with optimal resource utilization.
What Are Resource Requests and Limits?
Think of Kubernetes resource requests and limits as a reservation and cap system for your containers. When you define these values, you're telling Kubernetes two things: how much of a resource your pod needs to start (request) and the maximum amount it can use (limit).
Requests are the minimum resources Kubernetes guarantees to allocate to a pod. When scheduling a pod, the scheduler looks at the node's available resources and only places the pod on a node that has enough free resources to meet all the pod's requests. This ensures your application gets the resources it needs to start and run reliably.
Limits are the maximum resources a pod can consume. If a pod tries to use more than its limit, Kubernetes takes action based on the resource type. For CPU, this means throttling the pod's usage. For memory, it triggers an OOM (Out of Memory) kill. Limits prevent a single runaway container from exhausting all resources on a node and taking down other applications.
In this example, the container requests 256Mi of memory and 250m (250 millicores) of CPU, and can use up to 512Mi of memory and 500m of CPU. The request values are guarantees, while the limit values are caps.
CPU Resources: Millicores and Quotas
CPU in Kubernetes is a special resource because it's not a fixed amount like memory. CPU is measured in millicores, where 1000m equals one full CPU core. This decimal-based system allows for fine-grained allocation and sharing of CPU resources across many pods.
When you set a CPU request of 250m, you're requesting 1/4 of a CPU core. The scheduler ensures the node has at least 250m of available CPU before placing the pod. If the node has 2 CPU cores (2000m) and no other pods are running, it can accommodate multiple 250m requests.
CPU limits work differently than memory limits. If a pod exceeds its CPU limit, Kubernetes throttles the pod's CPU usage. The pod can still use CPU, but it won't get more than its limit. This is different from memory, where exceeding the limit results in immediate termination.
The throttling behavior means that CPU limits are soft caps. A pod can burst above its limit for short periods, but sustained usage above the limit will be capped. This design allows for better performance while preventing resource exhaustion.
Memory Resources: Bytes and OOMKilled
Memory is a fixed resource that's easier to understand than CPU. Memory is measured in bytes, with common units being Mi (mebibytes) and Gi (gibibytes). When you set a memory request, Kubernetes guarantees that amount of RAM will be available. When you set a memory limit, exceeding it results in immediate pod termination.
Memory limits are strict. If a pod tries to allocate more memory than its limit, Kubernetes triggers an OOMKilled event and restarts the pod. This is different from CPU, where the pod continues running but is throttled.
In this example, the pod can request 512Mi of memory to start, but it cannot use more than 1Gi. If it tries to allocate more, Kubernetes kills the pod and restarts it. The restart count will increase, and your application will experience downtime until it stabilizes.
Memory requests also affect pod scheduling. A pod with a high memory request will only be placed on nodes with enough free RAM. This means your cluster might have nodes that are CPU-bound but have plenty of free memory, or vice versa, depending on your pod configurations.
Why Resource Requests Matter for Scheduling
Resource requests are critical for pod scheduling decisions. The Kubernetes scheduler uses requests to determine which nodes can accommodate a pod. This is the primary reason requests exist — to ensure pods have the resources they need to run reliably.
When the scheduler evaluates a pod, it calculates the total requests of all existing pods on a node and compares them to the node's capacity. If the node has enough free resources to meet the pod's requests, the scheduler considers the node. If not, the scheduler looks for another node.
This process happens for every pod you create. If you have 100 pods with 250m CPU requests each, and a node with 2 CPU cores, the node can only accommodate 8 pods (2000m / 250m = 8). The 9th pod will wait until a node has enough free CPU.
Without proper requests, pods might be scheduled on nodes that don't have enough resources to run them. This leads to performance issues, crashes, and unpredictable behavior. The scheduler's request-based scheduling is the foundation of reliable pod placement.
Why Resource Limits Matter for Cluster Stability
Resource limits protect your cluster from resource exhaustion. Without limits, a single runaway container could consume all available CPU or memory on a node, taking down other applications and potentially the entire cluster.
Limits also affect how Kubernetes handles resource contention. When multiple pods compete for resources, Kubernetes prioritizes pods with higher requests. This means a pod with a 500m CPU request will get more CPU than a pod with a 250m request, all else being equal.
Limits are especially important for multi-tenant clusters where multiple teams share the same Kubernetes cluster. Without limits, one team's runaway application could impact other teams' applications, leading to conflicts and poor performance.
The OOMKilled event shows when a pod exceeded its memory limit. The pod is automatically restarted by Kubernetes, but this causes downtime and increases the restart count. Monitoring these events helps you identify pods with poorly configured memory limits.
Default Resource Limits and Best Practices
Many organizations set default resource limits for all pods to prevent resource exhaustion. This is a security and stability measure that ensures no single pod can consume excessive resources.
You can set default limits using Kubernetes resource quotas or admission controllers. Resource quotas enforce limits at the namespace level, while admission controllers can enforce defaults globally.
This resource quota ensures that in the production namespace, pods cannot request more than 10 CPU cores or 20Gi of memory total, and cannot use more than 20 CPU cores or 40Gi of memory total.
Best practices for default limits include:
- Set conservative defaults that prevent runaway resource usage
- Use different limits for different namespaces or teams
- Monitor resource usage to adjust limits as needed
- Document the rationale behind default limit values
Practical Walkthrough: Setting Resource Requests and Limits
Let's walk through a practical example of setting resource requests and limits for a web application. We'll use a simple Node.js application that serves HTTP requests.
First, create a deployment manifest with resource requests and limits:
This deployment sets resource requests and limits for the web application. The application requests 256Mi of memory and 250m of CPU, and can use up to 512Mi of memory and 500m of CPU.
After applying the deployment, verify that the pods are scheduled correctly:
You should see the pods running with their resource requests and limits. The scheduler has placed the pods on nodes with enough free resources to meet the requests.
Now, let's simulate a memory leak by modifying the application to allocate more memory over time. This will demonstrate how memory limits protect the cluster:
After changing the limit, monitor the pod events:
You'll see an OOMKilled event when the pod exceeds its memory limit. Kubernetes will automatically restart the pod, but the restart count will increase. This demonstrates how memory limits protect the cluster from runaway memory usage.
Common Mistakes and How to Avoid Them
1. Setting Requests Equal to Limits
Many developers set requests and limits to the same value. This is a common mistake because it defeats the purpose of limits. If requests equal limits, the pod can use all its requested resources, but it cannot exceed them. This means the pod cannot burst above its request, which can lead to performance issues during traffic spikes.
Instead, set requests lower than limits. The request should be the minimum resources the pod needs to run reliably, while the limit should be a higher value that allows for bursts. This gives the pod flexibility while still protecting the cluster.
2. Ignoring Requests and Only Setting Limits
Some developers only set limits and leave requests unset. This is problematic because unset requests default to zero. Without requests, the scheduler cannot make informed scheduling decisions, and pods might be placed on nodes with insufficient resources.
Always set both requests and limits. If you're unsure about the appropriate values, start with conservative requests and limits, then adjust based on monitoring and performance data.
3. Using Fixed Resource Values Without Monitoring
Setting resource values without monitoring is like setting a thermostat without a thermometer. You have no idea if your values are appropriate until you see performance issues or wasted resources.
Monitor your pod and node resource usage regularly. Use tools like Prometheus, Grafana, or Kubernetes metrics servers to track CPU and memory usage over time. Adjust your requests and limits based on actual usage patterns.
4. Over-Allocating Resources
Setting requests and limits higher than necessary wastes resources and increases cluster costs. If you allocate 2 CPU cores to a pod that only uses 0.5 cores, you're paying for resources you're not using.
Use monitoring data to determine appropriate resource values. Start with conservative values, then increase them only if you see performance issues or if the pod consistently uses more resources than expected.
5. Forgetting About Multi-Container Pods
If a pod has multiple containers, each container should have its own resource requests and limits. The pod's total resource usage is the sum of all container requests and limits.
In this example, the pod requests 512Mi of memory and 500m of CPU total, and can use up to 1Gi of memory and 1 CPU total. Each container within the pod should have its own resource configuration that sums to these totals.
Tools for Monitoring and Managing Resources
kubectl Top Commands
The kubectl top commands provide quick visibility into pod and node resource usage:
These commands require the metrics server to be installed in your cluster. The metrics server collects resource usage data from kubelets and makes it available to Kubernetes APIs.
Prometheus and Grafana
For detailed monitoring and alerting, use Prometheus and Grafana. Prometheus collects metrics from Kubernetes, and Grafana visualizes them in dashboards.
This Prometheus configuration scrapes metrics from pods annotated with prometheus.io/scrape: "true" and prometheus.io/port: "8080".
Kubernetes Resource Quotas
Resource quotas enforce limits at the namespace level, preventing resource exhaustion:
This resource quota ensures that in the production namespace, pods cannot request more than 10 CPU cores or 20Gi of memory total, and cannot use more than 20 CPU cores or 40Gi of memory total. It also limits the total number of pods to 50.
Conclusion
Resource requests and limits are fundamental to Kubernetes cluster stability and efficiency. Requests ensure pods have the resources they need to run reliably, while limits protect the cluster from resource exhaustion. Proper configuration of these values prevents performance issues, reduces wasted resources, and improves overall cluster health.
The key takeaways are:
- Always set both requests and limits for your pods
- Set requests lower than limits to allow for bursts
- Monitor resource usage regularly and adjust values as needed
- Use resource quotas to enforce limits at the namespace level
- Avoid common mistakes like setting requests equal to limits or ignoring requests entirely
Platforms like ServerlessBase simplify resource management by providing automated scaling and resource allocation based on actual usage patterns. This reduces the manual effort required to configure requests and limits while maintaining optimal cluster performance.
Start by setting conservative resource values for your pods, then use monitoring data to refine them over time. With proper configuration, your Kubernetes cluster will run smoothly with optimal resource utilization and minimal performance issues.