ServerlessBase Blog
  • Pod Disruption Budgets: Safe Cluster Maintenance

    A comprehensive guide to understanding and implementing Pod Disruption Budgets for maintaining high availability in Kubernetes clusters

    Pod Disruption Budgets: Safe Cluster Maintenance

    You've deployed your application to Kubernetes, configured replicas, and set up horizontal pod autoscaling. Everything looks good until you need to perform maintenance on your cluster nodes. Suddenly, your pods are being evicted, and your application experiences downtime or degraded performance. This is where Pod Disruption Budgets (PDBs) come in.

    Pod Disruption Budgets are a Kubernetes feature that ensures a minimum number of healthy pods are available during voluntary disruptions like node drains, upgrades, or scale-downs. They prevent you from accidentally taking down too many pods at once, which could lead to service unavailability or degraded performance.

    Understanding Voluntary vs Involuntary Disruptions

    Before diving into PDBs, it's important to understand the difference between voluntary and involuntary disruptions.

    Voluntary disruptions are actions you or your cluster operators take intentionally. Examples include:

    • Node maintenance and upgrades
    • Scaling down nodes to save costs
    • Cluster expansion or contraction
    • Planned node replacement

    Involuntary disruptions are unexpected events outside your control:

    • Node hardware failures
    • Network partitions
    • Cloud provider outages
    • Resource exhaustion

    PDBs only apply to voluntary disruptions. For involuntary ones, you need other mechanisms like high availability configurations and multiple availability zones.

    How Pod Disruption Budgets Work

    A Pod Disruption Budget defines the minimum number or percentage of pods that must remain available at any given time. Kubernetes uses this constraint to coordinate voluntary disruptions across the cluster.

    When you initiate a node drain (e.g., kubectl drain), Kubernetes checks all PDBs that apply to pods on that node. If a PDB would be violated by draining the node, Kubernetes delays the eviction until another pod can be scheduled elsewhere to maintain the minimum availability.

    PDB Types

    There are two types of Pod Disruption Budgets:

    Min Available: Ensures a minimum number of pods are always running.

    apiVersion: policy/v1
    kind: PodDisruptionBudget
    metadata:
      name: my-app-pdb
    spec:
      minAvailable: 2
      selector:
        matchLabels:
          app: my-app

    In this example, at least 2 out of 3 replicas must always be available.

    Max Unavailable: Ensures no more than a certain number of pods can be unavailable.

    apiVersion: policy/v1
    kind: PodDisruptionBudget
    metadata:
      name: my-app-pdb
    spec:
      maxUnavailable: 1
      selector:
        matchLabels:
          app: my-app

    This ensures that at most 1 pod can be unavailable at any time.

    Both types are equivalent when you know your replica count. For example, with 3 replicas:

    • minAvailable: 2 is the same as maxUnavailable: 1
    • minAvailable: 1 is the same as maxUnavailable: 2

    Comparing PDB Strategies

    Different applications have different availability requirements. Here's how to choose the right PDB strategy:

    Application TypePDB StrategyReasoning
    Stateful applicationsminAvailable: 1Guarantees at least one replica stays running
    Stateless web appsmaxUnavailable: 1Allows gradual rolling updates
    Critical servicesminAvailable: 100%No disruption allowed
    High-traffic APIsmaxUnavailable: 0Zero downtime deployments
    Batch jobsNo PDB neededJobs don't need continuous availability

    Implementing Pod Disruption Budgets

    Let's walk through a practical example of implementing PDBs for a web application.

    Step 1: Define Your Application Labels

    First, ensure your pods have consistent labels that you can use to select them:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: web-app
    spec:
      replicas: 3
      selector:
        matchLabels:
          app: web-app
          tier: frontend
      template:
        metadata:
          labels:
            app: web-app
            tier: frontend
        spec:
          containers:
          - name: web
            image: nginx:1.21
            ports:
            - containerPort: 80

    Step 2: Create the Pod Disruption Budget

    Now create a PDB that ensures at least 2 pods remain available:

    apiVersion: policy/v1
    kind: PodDisruptionBudget
    metadata:
      name: web-app-pdb
    spec:
      minAvailable: 2
      selector:
        matchLabels:
          app: web-app
          tier: frontend

    Apply this configuration with:

    kubectl apply -f web-app-pdb.yaml

    Step 3: Verify the PDB

    Check that the PDB is created and enforced:

    kubectl get pdb web-app-pdb

    Output:

    NAME           MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
    web-app-pdb    2               1                 1                     5m

    The ALLOWED DISRUPTIONS value shows how many pods can be evicted without violating the PDB. With 3 replicas and minAvailable: 2, you can safely evict 1 pod at a time.

    Step 4: Test the PDB During Node Drain

    Try to drain a node:

    kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data

    Kubernetes will wait until another pod is scheduled to replace the evicted pod before completing the drain. This ensures your application stays available throughout the process.

    PDBs with Rolling Updates

    PDBs work seamlessly with Kubernetes rolling updates. When you update your deployment, Kubernetes respects the PDB constraints during the rollout.

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: web-app
    spec:
      replicas: 3
      strategy:
        type: RollingUpdate
        rollingUpdate:
          maxSurge: 1
          maxUnavailable: 1
      selector:
        matchLabels:
          app: web-app
      template:
        metadata:
          labels:
            app: web-app
        spec:
          containers:
          - name: web
            image: nginx:1.22

    With maxUnavailable: 1 in both the deployment strategy and PDB, Kubernetes will:

    1. Evict 1 pod
    2. Wait for it to be replaced
    3. Evict the next pod
    4. Repeat until all pods are updated

    This provides a controlled, gradual rollout with minimal disruption.

    Common PDB Patterns

    Pattern 1: High Availability for Critical Services

    For services that cannot tolerate any downtime:

    apiVersion: policy/v1
    kind: PodDisruptionBudget
    metadata:
      name: critical-service-pdb
    spec:
      minAvailable: 100%
      selector:
        matchLabels:
          app: critical-service

    This ensures all pods remain running during any voluntary disruption.

    Pattern 2: Gradual Rollout for Stateless Apps

    For stateless applications that can handle some downtime:

    apiVersion: policy/v1
    kind: PodDisruptionBudget
    metadata:
      name: web-app-pdb
    spec:
      maxUnavailable: 1
      selector:
        matchLabels:
          app: web-app

    This allows one pod to be unavailable at a time, which is ideal for rolling updates.

    Pattern 3: Database with Read Replicas

    For databases with read replicas, you might want to ensure the primary database is always available:

    apiVersion: policy/v1
    kind: PodDisruptionBudget
    metadata:
      name: database-pdb
    spec:
      minAvailable: 1
      selector:
        matchLabels:
          app: database
          role: primary

    This ensures at least one primary database instance stays running during node maintenance.

    PDB Limitations and Best Practices

    Limitations

    1. Only voluntary disruptions: PDBs don't protect against involuntary failures like node crashes or network partitions.

    2. No guarantee during upgrades: PDBs don't prevent you from upgrading the Kubernetes control plane itself.

    3. No cross-cluster coordination: PDBs work within a single cluster. For multi-cluster deployments, you need additional coordination.

    4. Not enforced during scaling: PDBs don't apply when you scale replicas up or down manually.

    Best Practices

    1. Always define PDBs for production deployments: Even if you think your application can tolerate downtime, it's better to be safe.

    2. Use meaningful labels: Ensure your PDB selector matches your pod labels exactly. A mismatch will cause the PDB to have no effect.

    3. Test your PDBs: Before relying on PDBs in production, test them in a staging environment by simulating node drains.

    4. Monitor PDB violations: Use tools like Prometheus and Grafana to monitor PDB status and detect violations.

    5. Combine with other HA measures: PDBs are just one part of a high availability strategy. Use them alongside multiple availability zones, health checks, and monitoring.

    Troubleshooting PDB Issues

    Issue 1: PDB Not Enforced

    If your PDB doesn't seem to be working, check:

    kubectl describe pdb <pdb-name>

    Look for:

    • DisruptionsAllowed value
    • Current and Desired pod counts
    • Any events related to the PDB

    Issue 2: Cannot Drain Node

    If you can't drain a node due to PDB constraints:

    kubectl describe node <node-name>

    Check the Conditions section for PodDisruptionBudgets status.

    Issue 3: PDB Violations

    If you see PDB violations in your logs:

    kubectl get pdb -o wide

    Look for pods with Evicted status or check your monitoring system for PDB violation alerts.

    Conclusion

    Pod Disruption Budgets are a critical tool for maintaining high availability in Kubernetes clusters. They provide a simple yet powerful way to ensure your applications stay available during planned maintenance and upgrades.

    The key takeaways are:

    • PDBs only apply to voluntary disruptions, not hardware failures
    • Use minAvailable for critical services that cannot tolerate downtime
    • Use maxUnavailable for stateless applications that can handle gradual rollouts
    • Always test your PDBs in a staging environment before production
    • Combine PDBs with other high availability measures for a robust strategy

    Platforms like ServerlessBase can help you manage your Kubernetes deployments and ensure your PDBs are properly configured across your applications. By automating the deployment process and providing built-in monitoring, ServerlessBase makes it easier to maintain high availability without manual intervention.

    For more information on Kubernetes deployment strategies, check out the Kubernetes documentation.

    Leave comment