Vertical Pod Autoscaler (VPA): When and How to Use It

You've probably spent hours tuning your Kubernetes deployments, manually adjusting CPU and memory requests and limits. You've tried Horizontal Pod Autoscaler (HPA) to scale replicas based on load, but you're still seeing pods getting OOMKilled or underutilized resources. The missing piece might be Vertical Pod Autoscaler (VPA).

VPA automatically adjusts the resource requests and limits for your pods based on their actual resource consumption. Unlike HPA, which scales the number of replicas, VPA optimizes the resource allocation for each individual pod. This means your pods get exactly the resources they need, no more, no less.

Understanding VPA Fundamentals

How VPA Works

VPA operates by collecting metrics from your running pods and comparing them against the current resource requests and limits. When it detects that a pod's resource usage is consistently higher or lower than its configured requests, it recommends adjustments.

The VPA controller then applies these recommendations in one of two ways:

Recommends mode: VPA only provides recommendations without making changes. You can use these recommendations to manually update your deployments.
Update mode: VPA automatically updates the resource requests and limits in your deployment configurations. This is the most powerful mode but requires careful consideration.

VPA vs HPA: Key Differences

Feature	Horizontal Pod Autoscaler (HPA)	Vertical Pod Autoscaler (VPA)
Primary Goal	Scale number of replicas based on load	Optimize resource allocation per pod
Scope	Works at the Deployment/ReplicaSet level	Works at the Pod level
Resource Type	CPU and custom metrics	CPU and memory only
Impact	Increases cluster capacity	Optimizes existing capacity
Configuration	Horizontal scaling policies	Vertical scaling policies
Best For	High-traffic applications with variable load	Applications with stable resource profiles

When VPA Makes Recommendations

VPA makes recommendations when it observes a significant deviation between a pod's actual resource usage and its configured requests. The specific thresholds depend on your VPA configuration, but generally:

CPU: If a pod consistently uses more than 80% of its CPU request over a 5-minute window, VPA recommends increasing the CPU request.
Memory: If a pod's memory usage approaches its limit, VPA recommends increasing the memory request.

The controller also considers historical data and can recommend reducing resources if a pod consistently uses less than 50% of its configured requests.

VPA Modes of Operation

Recommends Mode

In recommends mode, VPA creates a VerticalPodAutoscaler object with a recommendation status. This status contains the recommended resource values without modifying your actual deployments.

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: Recreate
  resourcePolicy:
    containerPolicies:
    - containerName: my-app
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 2
        memory: 4Gi

To use these recommendations, you can:

Manually update your deployment's resource requests and limits
Use the kubectl describe vpa command to view recommendations
Export recommendations and apply them programmatically

Recreate Mode

Recreate mode is the safest default for production workloads. When VPA recommends changes:

VPA scales down all replicas of the deployment
VPA updates the deployment's resource requests and limits
VPA scales up the replicas again with the new resource configuration

This ensures that no running pods have incorrect resource allocations during the transition.

Auto Mode

Auto mode combines the benefits of both Recreate and Initial. It:

Updates resource requests for new pods
Recreates existing pods when necessary

This mode is useful for rolling updates where you want to gradually apply resource optimizations.

Initial Mode

Initial mode only updates resource requests for newly created pods. Existing pods continue to use their current resource configuration until they are recreated.

Implementing VPA in Your Cluster

Step 1: Install the VPA Controller

The VPA controller is part of the Kubernetes autoscaling components. You can install it using Helm:

helm repo add autoscaling https://kubernetes.github.io/autoscaling
helm repo update
helm install vpa autoscaling/vertical-pod-autoscaler --namespace kube-system

Or deploy it directly using manifests:

kubectl apply -f https://github.com/kubernetes/autoscaling/releases/download/v1.0.0/vertical-pod-autoscaler.yaml

Step 2: Create a VPA for Your Deployment

Create a VerticalPodAutoscaler resource targeting your deployment:

kubectl apply -f - <<EOF
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: Recreate
  resourcePolicy:
    containerPolicies:
    - containerName: my-app
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 2
        memory: 4Gi
EOF

Step 3: Monitor VPA Recommendations

Check the VPA status to see recommendations:

kubectl describe vpa my-app-vpa

You should see output like:

Status:
  Conditions:
  - LastTransitionTime: 2026-03-10T12:34:56Z
    Message: Recommendation is ready
    Reason: RecommendationReady
    Status: "True"
    Type: Ready
  RecommendedContainerResources:
    Container: my-app
    Limits:
      cpu: 500m
      memory: 512Mi
    Requests:
      cpu: 250m
      memory: 256Mi

Step 4: Apply Recommendations

If you're using recommends mode, apply the recommendations manually:

kubectl set resources deployment my-app \
  --requests=cpu=250m,memory=256Mi \
  --limits=cpu=500m,memory=512Mi

Or use the VPA's recommender subcommand to generate a patch:

kubectl vpa get-recommender my-app-vpa -o yaml > vpa-recommendation.yaml
kubectl apply -f vpa-recommendation.yaml

Best Practices for VPA Implementation

Set Appropriate Resource Limits

Always define resource limits to prevent runaway resource consumption:

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 512Mi

Without limits, a misbehaving pod could consume all available resources and impact other workloads.

Use Conservative Min/Max Bounds

Define minimum and maximum resource bounds to prevent extreme resource allocations:

resourcePolicy:
  containerPolicies:
  - containerName: my-app
    minAllowed:
      cpu: 100m
      memory: 128Mi
    maxAllowed:
      cpu: 2
      memory: 4Gi

These bounds protect your cluster from runaway resource consumption while still allowing VPA to optimize within reasonable limits.

Avoid VPA with HPA on the Same Deployment

Using VPA and HPA together can lead to conflicting behavior. HPA scales the number of replicas based on load, while VPA adjusts resource requests. This combination can cause:

Rapid scaling cycles as VPA changes resource requirements and HPA responds
Unpredictable pod behavior during transitions
Increased cluster resource consumption

If you need both horizontal and vertical scaling, consider using VPA for resource optimization and HPA for replica scaling, but monitor them carefully.

Consider Your Application's Resource Profile

VPA works best for applications with relatively stable resource profiles. Applications with highly variable resource usage patterns may benefit more from horizontal scaling than vertical optimization.

For example, a batch processing job that runs for short periods might benefit from HPA to scale up during execution and scale down when idle. A long-running web service with consistent load patterns might benefit more from VPA to optimize resource allocation.

Use VPA with Rolling Updates

When applying VPA recommendations in Recreate or Auto mode, ensure your deployment has rolling updates configured:

spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%

This ensures smooth transitions when pods are recreated with new resource configurations.

Monitor VPA Impact

Regularly monitor the impact of VPA on your cluster:

# Check VPA status
kubectl get vpa
 
# Check deployment resource changes
kubectl get deployment my-app -o yaml | grep resources
 
# Monitor pod resource usage
kubectl top pods

Look for patterns like:

Consistent resource recommendations over time
Frequent pod recreations due to VPA changes
Unexpected resource consumption patterns

Common Pitfalls and Solutions

Pitfall 1: OOMKilled Pods After VPA Updates

Problem: Pods are getting OOMKilled shortly after VPA updates their resource limits.

Solution: Increase the memory limit or adjust the VPA's maxAllowed bounds:

resourcePolicy:
  containerPolicies:
  - containerName: my-app
    maxAllowed:
      cpu: 2
      memory: 2Gi  # Reduced from 4Gi

Pitfall 2: VPA Recommendations Are Not Applied

Problem: VPA shows recommendations but they are not being applied to your pods.

Solution: Check the VPA status and conditions:

kubectl describe vpa my-app-vpa

Common issues include:

VPA is not running or has errors
The target deployment does not exist
Resource policies are not configured correctly
The deployment has resource limits set to null

Pitfall 3: VPA Causing Frequent Pod Recreations

Problem: Pods are being recreated constantly due to VPA recommendations.

Solution: Adjust the VPA's update mode or resource policy thresholds:

updatePolicy:
  updateMode: Initial  # Only update new pods
resourcePolicy:
  containerPolicies:
  - containerName: my-app
    minAllowed:
      cpu: 100m
      memory: 128Mi
    maxAllowed:
      cpu: 2
      memory: 4Gi

Pitfall 4: VPA Not Optimizing Resources

Problem: VPA is not reducing resource requests for underutilized pods.

Solution: Check that VPA has sufficient data to make recommendations:

# Check VPA history
kubectl get vpa my-app-vpa -o yaml | grep -A 10 "conditions"

VPA needs to observe pod behavior for at least 5-10 minutes before making recommendations. If pods are newly created or have been running for less time, VPA may not have enough data.

Advanced VPA Configuration

Custom Resource Policies

You can define different resource policies for different containers in the same pod:

resourcePolicy:
  containerPolicies:
  - containerName: web
    minAllowed:
      cpu: 100m
      memory: 128Mi
    maxAllowed:
      cpu: 1
      memory: 2Gi
  - containerName: worker
    minAllowed:
      cpu: 200m
      memory: 256Mi
    maxAllowed:
      cpu: 2
      memory: 4Gi

Targeting Specific Namespaces

Apply VPA only to specific namespaces to avoid affecting all workloads:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: Recreate
  namespaceSelector:
    matchNames:
    - production
    - staging

Using VPA with Custom Metrics

While VPA primarily works with CPU and memory, you can extend it with custom metrics using the metrics field:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: Recreate
  resourcePolicy:
    containerPolicies:
    - containerName: my-app
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 2
        memory: 4Gi
  metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: 100

Conclusion

Vertical Pod Autoscaler is a powerful tool for optimizing resource allocation in Kubernetes. By automatically adjusting resource requests and limits based on actual usage, VPA can help you:

Reduce resource waste by eliminating over-provisioned pods
Prevent OOMKilled errors by ensuring pods have adequate resources
Improve cluster efficiency by making better use of available resources
Simplify resource management by reducing manual tuning

However, VPA is not a silver bullet. It works best when combined with proper resource limits, conservative min/max bounds, and careful monitoring. Avoid using VPA with HPA on the same deployment unless you understand the potential conflicts.

Start by implementing VPA in recommends mode to understand its recommendations without making automatic changes. Once you're comfortable with the recommendations, gradually transition to Recreate or Auto mode for automatic resource optimization.

Remember that VPA is most effective for applications with stable resource profiles. Highly variable workloads may benefit more from horizontal scaling strategies. The key is to understand your application's behavior and choose the right combination of autoscaling tools for your specific use case.

Platforms like ServerlessBase can simplify the deployment and management of Kubernetes applications, including VPA configuration, making it easier to implement and monitor resource optimization strategies across your infrastructure.

Next Steps

Install the VPA controller in your Kubernetes cluster
Create a VPA for a test deployment in recommends mode
Monitor recommendations using kubectl describe vpa
Apply recommendations manually to understand the impact
Gradually transition to automatic update modes in a controlled environment
Monitor cluster behavior closely during the transition
Adjust resource policies based on your observations

By following these steps, you can safely implement VPA and start optimizing your Kubernetes resource allocation today.