ServerlessBase Blog
  • Vertical Pod Autoscaler (VPA): When and How to Use It

    A comprehensive guide to understanding and implementing Vertical Pod Autoscaler in Kubernetes for optimal resource allocation

    Vertical Pod Autoscaler (VPA): When and How to Use It

    You've probably spent hours tuning your Kubernetes deployments, manually adjusting CPU and memory requests and limits. You've tried Horizontal Pod Autoscaler (HPA) to scale replicas based on load, but you're still seeing pods getting OOMKilled or underutilized resources. The missing piece might be Vertical Pod Autoscaler (VPA).

    VPA automatically adjusts the resource requests and limits for your pods based on their actual resource consumption. Unlike HPA, which scales the number of replicas, VPA optimizes the resource allocation for each individual pod. This means your pods get exactly the resources they need, no more, no less.

    Understanding VPA Fundamentals

    How VPA Works

    VPA operates by collecting metrics from your running pods and comparing them against the current resource requests and limits. When it detects that a pod's resource usage is consistently higher or lower than its configured requests, it recommends adjustments.

    The VPA controller then applies these recommendations in one of two ways:

    1. Recommends mode: VPA only provides recommendations without making changes. You can use these recommendations to manually update your deployments.

    2. Update mode: VPA automatically updates the resource requests and limits in your deployment configurations. This is the most powerful mode but requires careful consideration.

    VPA vs HPA: Key Differences

    FeatureHorizontal Pod Autoscaler (HPA)Vertical Pod Autoscaler (VPA)
    Primary GoalScale number of replicas based on loadOptimize resource allocation per pod
    ScopeWorks at the Deployment/ReplicaSet levelWorks at the Pod level
    Resource TypeCPU and custom metricsCPU and memory only
    ImpactIncreases cluster capacityOptimizes existing capacity
    ConfigurationHorizontal scaling policiesVertical scaling policies
    Best ForHigh-traffic applications with variable loadApplications with stable resource profiles

    When VPA Makes Recommendations

    VPA makes recommendations when it observes a significant deviation between a pod's actual resource usage and its configured requests. The specific thresholds depend on your VPA configuration, but generally:

    • CPU: If a pod consistently uses more than 80% of its CPU request over a 5-minute window, VPA recommends increasing the CPU request.
    • Memory: If a pod's memory usage approaches its limit, VPA recommends increasing the memory request.

    The controller also considers historical data and can recommend reducing resources if a pod consistently uses less than 50% of its configured requests.

    VPA Modes of Operation

    Recommends Mode

    In recommends mode, VPA creates a VerticalPodAutoscaler object with a recommendation status. This status contains the recommended resource values without modifying your actual deployments.

    apiVersion: autoscaling.k8s.io/v1
    kind: VerticalPodAutoscaler
    metadata:
      name: my-app-vpa
    spec:
      targetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: my-app
      updatePolicy:
        updateMode: Recreate
      resourcePolicy:
        containerPolicies:
        - containerName: my-app
          minAllowed:
            cpu: 100m
            memory: 128Mi
          maxAllowed:
            cpu: 2
            memory: 4Gi

    To use these recommendations, you can:

    1. Manually update your deployment's resource requests and limits
    2. Use the kubectl describe vpa command to view recommendations
    3. Export recommendations and apply them programmatically

    Recreate Mode

    Recreate mode is the safest default for production workloads. When VPA recommends changes:

    1. VPA scales down all replicas of the deployment
    2. VPA updates the deployment's resource requests and limits
    3. VPA scales up the replicas again with the new resource configuration

    This ensures that no running pods have incorrect resource allocations during the transition.

    Auto Mode

    Auto mode combines the benefits of both Recreate and Initial. It:

    1. Updates resource requests for new pods
    2. Recreates existing pods when necessary

    This mode is useful for rolling updates where you want to gradually apply resource optimizations.

    Initial Mode

    Initial mode only updates resource requests for newly created pods. Existing pods continue to use their current resource configuration until they are recreated.

    Implementing VPA in Your Cluster

    Step 1: Install the VPA Controller

    The VPA controller is part of the Kubernetes autoscaling components. You can install it using Helm:

    helm repo add autoscaling https://kubernetes.github.io/autoscaling
    helm repo update
    helm install vpa autoscaling/vertical-pod-autoscaler --namespace kube-system

    Or deploy it directly using manifests:

    kubectl apply -f https://github.com/kubernetes/autoscaling/releases/download/v1.0.0/vertical-pod-autoscaler.yaml

    Step 2: Create a VPA for Your Deployment

    Create a VerticalPodAutoscaler resource targeting your deployment:

    kubectl apply -f - <<EOF
    apiVersion: autoscaling.k8s.io/v1
    kind: VerticalPodAutoscaler
    metadata:
      name: my-app-vpa
    spec:
      targetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: my-app
      updatePolicy:
        updateMode: Recreate
      resourcePolicy:
        containerPolicies:
        - containerName: my-app
          minAllowed:
            cpu: 100m
            memory: 128Mi
          maxAllowed:
            cpu: 2
            memory: 4Gi
    EOF

    Step 3: Monitor VPA Recommendations

    Check the VPA status to see recommendations:

    kubectl describe vpa my-app-vpa

    You should see output like:

    Status:
      Conditions:
      - LastTransitionTime: 2026-03-10T12:34:56Z
        Message: Recommendation is ready
        Reason: RecommendationReady
        Status: "True"
        Type: Ready
      RecommendedContainerResources:
        Container: my-app
        Limits:
          cpu: 500m
          memory: 512Mi
        Requests:
          cpu: 250m
          memory: 256Mi

    Step 4: Apply Recommendations

    If you're using recommends mode, apply the recommendations manually:

    kubectl set resources deployment my-app \
      --requests=cpu=250m,memory=256Mi \
      --limits=cpu=500m,memory=512Mi

    Or use the VPA's recommender subcommand to generate a patch:

    kubectl vpa get-recommender my-app-vpa -o yaml > vpa-recommendation.yaml
    kubectl apply -f vpa-recommendation.yaml

    Best Practices for VPA Implementation

    Set Appropriate Resource Limits

    Always define resource limits to prevent runaway resource consumption:

    resources:
      requests:
        cpu: 100m
        memory: 128Mi
      limits:
        cpu: 500m
        memory: 512Mi

    Without limits, a misbehaving pod could consume all available resources and impact other workloads.

    Use Conservative Min/Max Bounds

    Define minimum and maximum resource bounds to prevent extreme resource allocations:

    resourcePolicy:
      containerPolicies:
      - containerName: my-app
        minAllowed:
          cpu: 100m
          memory: 128Mi
        maxAllowed:
          cpu: 2
          memory: 4Gi

    These bounds protect your cluster from runaway resource consumption while still allowing VPA to optimize within reasonable limits.

    Avoid VPA with HPA on the Same Deployment

    Using VPA and HPA together can lead to conflicting behavior. HPA scales the number of replicas based on load, while VPA adjusts resource requests. This combination can cause:

    • Rapid scaling cycles as VPA changes resource requirements and HPA responds
    • Unpredictable pod behavior during transitions
    • Increased cluster resource consumption

    If you need both horizontal and vertical scaling, consider using VPA for resource optimization and HPA for replica scaling, but monitor them carefully.

    Consider Your Application's Resource Profile

    VPA works best for applications with relatively stable resource profiles. Applications with highly variable resource usage patterns may benefit more from horizontal scaling than vertical optimization.

    For example, a batch processing job that runs for short periods might benefit from HPA to scale up during execution and scale down when idle. A long-running web service with consistent load patterns might benefit more from VPA to optimize resource allocation.

    Use VPA with Rolling Updates

    When applying VPA recommendations in Recreate or Auto mode, ensure your deployment has rolling updates configured:

    spec:
      strategy:
        type: RollingUpdate
        rollingUpdate:
          maxSurge: 25%
          maxUnavailable: 25%

    This ensures smooth transitions when pods are recreated with new resource configurations.

    Monitor VPA Impact

    Regularly monitor the impact of VPA on your cluster:

    # Check VPA status
    kubectl get vpa
     
    # Check deployment resource changes
    kubectl get deployment my-app -o yaml | grep resources
     
    # Monitor pod resource usage
    kubectl top pods

    Look for patterns like:

    • Consistent resource recommendations over time
    • Frequent pod recreations due to VPA changes
    • Unexpected resource consumption patterns

    Common Pitfalls and Solutions

    Pitfall 1: OOMKilled Pods After VPA Updates

    Problem: Pods are getting OOMKilled shortly after VPA updates their resource limits.

    Solution: Increase the memory limit or adjust the VPA's maxAllowed bounds:

    resourcePolicy:
      containerPolicies:
      - containerName: my-app
        maxAllowed:
          cpu: 2
          memory: 2Gi  # Reduced from 4Gi

    Pitfall 2: VPA Recommendations Are Not Applied

    Problem: VPA shows recommendations but they are not being applied to your pods.

    Solution: Check the VPA status and conditions:

    kubectl describe vpa my-app-vpa

    Common issues include:

    • VPA is not running or has errors
    • The target deployment does not exist
    • Resource policies are not configured correctly
    • The deployment has resource limits set to null

    Pitfall 3: VPA Causing Frequent Pod Recreations

    Problem: Pods are being recreated constantly due to VPA recommendations.

    Solution: Adjust the VPA's update mode or resource policy thresholds:

    updatePolicy:
      updateMode: Initial  # Only update new pods
    resourcePolicy:
      containerPolicies:
      - containerName: my-app
        minAllowed:
          cpu: 100m
          memory: 128Mi
        maxAllowed:
          cpu: 2
          memory: 4Gi

    Pitfall 4: VPA Not Optimizing Resources

    Problem: VPA is not reducing resource requests for underutilized pods.

    Solution: Check that VPA has sufficient data to make recommendations:

    # Check VPA history
    kubectl get vpa my-app-vpa -o yaml | grep -A 10 "conditions"

    VPA needs to observe pod behavior for at least 5-10 minutes before making recommendations. If pods are newly created or have been running for less time, VPA may not have enough data.

    Advanced VPA Configuration

    Custom Resource Policies

    You can define different resource policies for different containers in the same pod:

    resourcePolicy:
      containerPolicies:
      - containerName: web
        minAllowed:
          cpu: 100m
          memory: 128Mi
        maxAllowed:
          cpu: 1
          memory: 2Gi
      - containerName: worker
        minAllowed:
          cpu: 200m
          memory: 256Mi
        maxAllowed:
          cpu: 2
          memory: 4Gi

    Targeting Specific Namespaces

    Apply VPA only to specific namespaces to avoid affecting all workloads:

    apiVersion: autoscaling.k8s.io/v1
    kind: VerticalPodAutoscaler
    metadata:
      name: my-app-vpa
    spec:
      targetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: my-app
      updatePolicy:
        updateMode: Recreate
      namespaceSelector:
        matchNames:
        - production
        - staging

    Using VPA with Custom Metrics

    While VPA primarily works with CPU and memory, you can extend it with custom metrics using the metrics field:

    apiVersion: autoscaling.k8s.io/v1
    kind: VerticalPodAutoscaler
    metadata:
      name: my-app-vpa
    spec:
      targetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: my-app
      updatePolicy:
        updateMode: Recreate
      resourcePolicy:
        containerPolicies:
        - containerName: my-app
          minAllowed:
            cpu: 100m
            memory: 128Mi
          maxAllowed:
            cpu: 2
            memory: 4Gi
      metrics:
      - type: Pods
        pods:
          metric:
            name: http_requests_per_second
          target:
            type: AverageValue
            averageValue: 100

    Conclusion

    Vertical Pod Autoscaler is a powerful tool for optimizing resource allocation in Kubernetes. By automatically adjusting resource requests and limits based on actual usage, VPA can help you:

    • Reduce resource waste by eliminating over-provisioned pods
    • Prevent OOMKilled errors by ensuring pods have adequate resources
    • Improve cluster efficiency by making better use of available resources
    • Simplify resource management by reducing manual tuning

    However, VPA is not a silver bullet. It works best when combined with proper resource limits, conservative min/max bounds, and careful monitoring. Avoid using VPA with HPA on the same deployment unless you understand the potential conflicts.

    Start by implementing VPA in recommends mode to understand its recommendations without making automatic changes. Once you're comfortable with the recommendations, gradually transition to Recreate or Auto mode for automatic resource optimization.

    Remember that VPA is most effective for applications with stable resource profiles. Highly variable workloads may benefit more from horizontal scaling strategies. The key is to understand your application's behavior and choose the right combination of autoscaling tools for your specific use case.

    Platforms like ServerlessBase can simplify the deployment and management of Kubernetes applications, including VPA configuration, making it easier to implement and monitor resource optimization strategies across your infrastructure.

    Next Steps

    1. Install the VPA controller in your Kubernetes cluster
    2. Create a VPA for a test deployment in recommends mode
    3. Monitor recommendations using kubectl describe vpa
    4. Apply recommendations manually to understand the impact
    5. Gradually transition to automatic update modes in a controlled environment
    6. Monitor cluster behavior closely during the transition
    7. Adjust resource policies based on your observations

    By following these steps, you can safely implement VPA and start optimizing your Kubernetes resource allocation today.

    Leave comment