Pod Disruption Budgets: Safe Cluster Maintenance

You've deployed your application to Kubernetes, configured replicas, and set up horizontal pod autoscaling. Everything looks good until you need to perform maintenance on your cluster nodes. Suddenly, your pods are being evicted, and your application experiences downtime or degraded performance. This is where Pod Disruption Budgets (PDBs) come in.

Pod Disruption Budgets are a Kubernetes feature that ensures a minimum number of healthy pods are available during voluntary disruptions like node drains, upgrades, or scale-downs. They prevent you from accidentally taking down too many pods at once, which could lead to service unavailability or degraded performance.

Understanding Voluntary vs Involuntary Disruptions

Before diving into PDBs, it's important to understand the difference between voluntary and involuntary disruptions.

Voluntary disruptions are actions you or your cluster operators take intentionally. Examples include:

Node maintenance and upgrades
Scaling down nodes to save costs
Cluster expansion or contraction
Planned node replacement

Involuntary disruptions are unexpected events outside your control:

Node hardware failures
Network partitions
Cloud provider outages
Resource exhaustion

PDBs only apply to voluntary disruptions. For involuntary ones, you need other mechanisms like high availability configurations and multiple availability zones.

How Pod Disruption Budgets Work

A Pod Disruption Budget defines the minimum number or percentage of pods that must remain available at any given time. Kubernetes uses this constraint to coordinate voluntary disruptions across the cluster.

When you initiate a node drain (e.g., kubectl drain), Kubernetes checks all PDBs that apply to pods on that node. If a PDB would be violated by draining the node, Kubernetes delays the eviction until another pod can be scheduled elsewhere to maintain the minimum availability.

PDB Types

There are two types of Pod Disruption Budgets:

Min Available: Ensures a minimum number of pods are always running.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-app-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: my-app

In this example, at least 2 out of 3 replicas must always be available.

Max Unavailable: Ensures no more than a certain number of pods can be unavailable.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-app-pdb
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app: my-app

This ensures that at most 1 pod can be unavailable at any time.

Both types are equivalent when you know your replica count. For example, with 3 replicas:

minAvailable: 2 is the same as maxUnavailable: 1
minAvailable: 1 is the same as maxUnavailable: 2

Comparing PDB Strategies

Different applications have different availability requirements. Here's how to choose the right PDB strategy:

Application Type	PDB Strategy	Reasoning
Stateful applications	`minAvailable: 1`	Guarantees at least one replica stays running
Stateless web apps	`maxUnavailable: 1`	Allows gradual rolling updates
Critical services	`minAvailable: 100%`	No disruption allowed
High-traffic APIs	`maxUnavailable: 0`	Zero downtime deployments
Batch jobs	No PDB needed	Jobs don't need continuous availability

Implementing Pod Disruption Budgets

Let's walk through a practical example of implementing PDBs for a web application.

Step 1: Define Your Application Labels

First, ensure your pods have consistent labels that you can use to select them:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-app
      tier: frontend
  template:
    metadata:
      labels:
        app: web-app
        tier: frontend
    spec:
      containers:
      - name: web
        image: nginx:1.21
        ports:
        - containerPort: 80

Step 2: Create the Pod Disruption Budget

Now create a PDB that ensures at least 2 pods remain available:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web-app-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: web-app
      tier: frontend

Apply this configuration with:

kubectl apply -f web-app-pdb.yaml

Step 3: Verify the PDB

Check that the PDB is created and enforced:

kubectl get pdb web-app-pdb

Output:

NAME           MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
web-app-pdb    2               1                 1                     5m

The ALLOWED DISRUPTIONS value shows how many pods can be evicted without violating the PDB. With 3 replicas and minAvailable: 2, you can safely evict 1 pod at a time.

Step 4: Test the PDB During Node Drain

Try to drain a node:

kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data

Kubernetes will wait until another pod is scheduled to replace the evicted pod before completing the drain. This ensures your application stays available throughout the process.

PDBs with Rolling Updates

PDBs work seamlessly with Kubernetes rolling updates. When you update your deployment, Kubernetes respects the PDB constraints during the rollout.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
      - name: web
        image: nginx:1.22

With maxUnavailable: 1 in both the deployment strategy and PDB, Kubernetes will:

Evict 1 pod
Wait for it to be replaced
Evict the next pod
Repeat until all pods are updated

This provides a controlled, gradual rollout with minimal disruption.

Common PDB Patterns

Pattern 1: High Availability for Critical Services

For services that cannot tolerate any downtime:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: critical-service-pdb
spec:
  minAvailable: 100%
  selector:
    matchLabels:
      app: critical-service

This ensures all pods remain running during any voluntary disruption.

Pattern 2: Gradual Rollout for Stateless Apps

For stateless applications that can handle some downtime:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web-app-pdb
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app: web-app

This allows one pod to be unavailable at a time, which is ideal for rolling updates.

Pattern 3: Database with Read Replicas

For databases with read replicas, you might want to ensure the primary database is always available:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: database-pdb
spec:
  minAvailable: 1
  selector:
    matchLabels:
      app: database
      role: primary

This ensures at least one primary database instance stays running during node maintenance.

PDB Limitations and Best Practices

Limitations

Only voluntary disruptions: PDBs don't protect against involuntary failures like node crashes or network partitions.
No guarantee during upgrades: PDBs don't prevent you from upgrading the Kubernetes control plane itself.
No cross-cluster coordination: PDBs work within a single cluster. For multi-cluster deployments, you need additional coordination.
Not enforced during scaling: PDBs don't apply when you scale replicas up or down manually.

Best Practices

Always define PDBs for production deployments: Even if you think your application can tolerate downtime, it's better to be safe.
Use meaningful labels: Ensure your PDB selector matches your pod labels exactly. A mismatch will cause the PDB to have no effect.
Test your PDBs: Before relying on PDBs in production, test them in a staging environment by simulating node drains.
Monitor PDB violations: Use tools like Prometheus and Grafana to monitor PDB status and detect violations.
Combine with other HA measures: PDBs are just one part of a high availability strategy. Use them alongside multiple availability zones, health checks, and monitoring.

Troubleshooting PDB Issues

Issue 1: PDB Not Enforced

If your PDB doesn't seem to be working, check:

kubectl describe pdb <pdb-name>

Look for:

DisruptionsAllowed value
Current and Desired pod counts
Any events related to the PDB

Issue 2: Cannot Drain Node

If you can't drain a node due to PDB constraints:

kubectl describe node <node-name>

Check the Conditions section for PodDisruptionBudgets status.

Issue 3: PDB Violations

If you see PDB violations in your logs:

kubectl get pdb -o wide

Look for pods with Evicted status or check your monitoring system for PDB violation alerts.

Conclusion

Pod Disruption Budgets are a critical tool for maintaining high availability in Kubernetes clusters. They provide a simple yet powerful way to ensure your applications stay available during planned maintenance and upgrades.

The key takeaways are:

PDBs only apply to voluntary disruptions, not hardware failures
Use minAvailable for critical services that cannot tolerate downtime
Use maxUnavailable for stateless applications that can handle gradual rollouts
Always test your PDBs in a staging environment before production
Combine PDBs with other high availability measures for a robust strategy

Platforms like ServerlessBase can help you manage your Kubernetes deployments and ensure your PDBs are properly configured across your applications. By automating the deployment process and providing built-in monitoring, ServerlessBase makes it easier to maintain high availability without manual intervention.

For more information on Kubernetes deployment strategies, check out the Kubernetes documentation.