Vertical Pod Autoscaler (VPA): When and How to Use It
You've probably spent hours tuning your Kubernetes deployments, manually adjusting CPU and memory requests and limits. You've tried Horizontal Pod Autoscaler (HPA) to scale replicas based on load, but you're still seeing pods getting OOMKilled or underutilized resources. The missing piece might be Vertical Pod Autoscaler (VPA).
VPA automatically adjusts the resource requests and limits for your pods based on their actual resource consumption. Unlike HPA, which scales the number of replicas, VPA optimizes the resource allocation for each individual pod. This means your pods get exactly the resources they need, no more, no less.
Understanding VPA Fundamentals
How VPA Works
VPA operates by collecting metrics from your running pods and comparing them against the current resource requests and limits. When it detects that a pod's resource usage is consistently higher or lower than its configured requests, it recommends adjustments.
The VPA controller then applies these recommendations in one of two ways:
-
Recommends mode: VPA only provides recommendations without making changes. You can use these recommendations to manually update your deployments.
-
Update mode: VPA automatically updates the resource requests and limits in your deployment configurations. This is the most powerful mode but requires careful consideration.
VPA vs HPA: Key Differences
| Feature | Horizontal Pod Autoscaler (HPA) | Vertical Pod Autoscaler (VPA) |
|---|---|---|
| Primary Goal | Scale number of replicas based on load | Optimize resource allocation per pod |
| Scope | Works at the Deployment/ReplicaSet level | Works at the Pod level |
| Resource Type | CPU and custom metrics | CPU and memory only |
| Impact | Increases cluster capacity | Optimizes existing capacity |
| Configuration | Horizontal scaling policies | Vertical scaling policies |
| Best For | High-traffic applications with variable load | Applications with stable resource profiles |
When VPA Makes Recommendations
VPA makes recommendations when it observes a significant deviation between a pod's actual resource usage and its configured requests. The specific thresholds depend on your VPA configuration, but generally:
- CPU: If a pod consistently uses more than 80% of its CPU request over a 5-minute window, VPA recommends increasing the CPU request.
- Memory: If a pod's memory usage approaches its limit, VPA recommends increasing the memory request.
The controller also considers historical data and can recommend reducing resources if a pod consistently uses less than 50% of its configured requests.
VPA Modes of Operation
Recommends Mode
In recommends mode, VPA creates a VerticalPodAutoscaler object with a recommendation status. This status contains the recommended resource values without modifying your actual deployments.
To use these recommendations, you can:
- Manually update your deployment's resource requests and limits
- Use the
kubectl describe vpacommand to view recommendations - Export recommendations and apply them programmatically
Recreate Mode
Recreate mode is the safest default for production workloads. When VPA recommends changes:
- VPA scales down all replicas of the deployment
- VPA updates the deployment's resource requests and limits
- VPA scales up the replicas again with the new resource configuration
This ensures that no running pods have incorrect resource allocations during the transition.
Auto Mode
Auto mode combines the benefits of both Recreate and Initial. It:
- Updates resource requests for new pods
- Recreates existing pods when necessary
This mode is useful for rolling updates where you want to gradually apply resource optimizations.
Initial Mode
Initial mode only updates resource requests for newly created pods. Existing pods continue to use their current resource configuration until they are recreated.
Implementing VPA in Your Cluster
Step 1: Install the VPA Controller
The VPA controller is part of the Kubernetes autoscaling components. You can install it using Helm:
Or deploy it directly using manifests:
Step 2: Create a VPA for Your Deployment
Create a VerticalPodAutoscaler resource targeting your deployment:
Step 3: Monitor VPA Recommendations
Check the VPA status to see recommendations:
You should see output like:
Step 4: Apply Recommendations
If you're using recommends mode, apply the recommendations manually:
Or use the VPA's recommender subcommand to generate a patch:
Best Practices for VPA Implementation
Set Appropriate Resource Limits
Always define resource limits to prevent runaway resource consumption:
Without limits, a misbehaving pod could consume all available resources and impact other workloads.
Use Conservative Min/Max Bounds
Define minimum and maximum resource bounds to prevent extreme resource allocations:
These bounds protect your cluster from runaway resource consumption while still allowing VPA to optimize within reasonable limits.
Avoid VPA with HPA on the Same Deployment
Using VPA and HPA together can lead to conflicting behavior. HPA scales the number of replicas based on load, while VPA adjusts resource requests. This combination can cause:
- Rapid scaling cycles as VPA changes resource requirements and HPA responds
- Unpredictable pod behavior during transitions
- Increased cluster resource consumption
If you need both horizontal and vertical scaling, consider using VPA for resource optimization and HPA for replica scaling, but monitor them carefully.
Consider Your Application's Resource Profile
VPA works best for applications with relatively stable resource profiles. Applications with highly variable resource usage patterns may benefit more from horizontal scaling than vertical optimization.
For example, a batch processing job that runs for short periods might benefit from HPA to scale up during execution and scale down when idle. A long-running web service with consistent load patterns might benefit more from VPA to optimize resource allocation.
Use VPA with Rolling Updates
When applying VPA recommendations in Recreate or Auto mode, ensure your deployment has rolling updates configured:
This ensures smooth transitions when pods are recreated with new resource configurations.
Monitor VPA Impact
Regularly monitor the impact of VPA on your cluster:
Look for patterns like:
- Consistent resource recommendations over time
- Frequent pod recreations due to VPA changes
- Unexpected resource consumption patterns
Common Pitfalls and Solutions
Pitfall 1: OOMKilled Pods After VPA Updates
Problem: Pods are getting OOMKilled shortly after VPA updates their resource limits.
Solution: Increase the memory limit or adjust the VPA's maxAllowed bounds:
Pitfall 2: VPA Recommendations Are Not Applied
Problem: VPA shows recommendations but they are not being applied to your pods.
Solution: Check the VPA status and conditions:
Common issues include:
- VPA is not running or has errors
- The target deployment does not exist
- Resource policies are not configured correctly
- The deployment has resource limits set to
null
Pitfall 3: VPA Causing Frequent Pod Recreations
Problem: Pods are being recreated constantly due to VPA recommendations.
Solution: Adjust the VPA's update mode or resource policy thresholds:
Pitfall 4: VPA Not Optimizing Resources
Problem: VPA is not reducing resource requests for underutilized pods.
Solution: Check that VPA has sufficient data to make recommendations:
VPA needs to observe pod behavior for at least 5-10 minutes before making recommendations. If pods are newly created or have been running for less time, VPA may not have enough data.
Advanced VPA Configuration
Custom Resource Policies
You can define different resource policies for different containers in the same pod:
Targeting Specific Namespaces
Apply VPA only to specific namespaces to avoid affecting all workloads:
Using VPA with Custom Metrics
While VPA primarily works with CPU and memory, you can extend it with custom metrics using the metrics field:
Conclusion
Vertical Pod Autoscaler is a powerful tool for optimizing resource allocation in Kubernetes. By automatically adjusting resource requests and limits based on actual usage, VPA can help you:
- Reduce resource waste by eliminating over-provisioned pods
- Prevent OOMKilled errors by ensuring pods have adequate resources
- Improve cluster efficiency by making better use of available resources
- Simplify resource management by reducing manual tuning
However, VPA is not a silver bullet. It works best when combined with proper resource limits, conservative min/max bounds, and careful monitoring. Avoid using VPA with HPA on the same deployment unless you understand the potential conflicts.
Start by implementing VPA in recommends mode to understand its recommendations without making automatic changes. Once you're comfortable with the recommendations, gradually transition to Recreate or Auto mode for automatic resource optimization.
Remember that VPA is most effective for applications with stable resource profiles. Highly variable workloads may benefit more from horizontal scaling strategies. The key is to understand your application's behavior and choose the right combination of autoscaling tools for your specific use case.
Platforms like ServerlessBase can simplify the deployment and management of Kubernetes applications, including VPA configuration, making it easier to implement and monitor resource optimization strategies across your infrastructure.
Next Steps
- Install the VPA controller in your Kubernetes cluster
- Create a VPA for a test deployment in recommends mode
- Monitor recommendations using
kubectl describe vpa - Apply recommendations manually to understand the impact
- Gradually transition to automatic update modes in a controlled environment
- Monitor cluster behavior closely during the transition
- Adjust resource policies based on your observations
By following these steps, you can safely implement VPA and start optimizing your Kubernetes resource allocation today.