Pod Disruption Budgets: Safe Cluster Maintenance
You've deployed your application to Kubernetes, configured replicas, and set up horizontal pod autoscaling. Everything looks good until you need to perform maintenance on your cluster nodes. Suddenly, your pods are being evicted, and your application experiences downtime or degraded performance. This is where Pod Disruption Budgets (PDBs) come in.
Pod Disruption Budgets are a Kubernetes feature that ensures a minimum number of healthy pods are available during voluntary disruptions like node drains, upgrades, or scale-downs. They prevent you from accidentally taking down too many pods at once, which could lead to service unavailability or degraded performance.
Understanding Voluntary vs Involuntary Disruptions
Before diving into PDBs, it's important to understand the difference between voluntary and involuntary disruptions.
Voluntary disruptions are actions you or your cluster operators take intentionally. Examples include:
- Node maintenance and upgrades
- Scaling down nodes to save costs
- Cluster expansion or contraction
- Planned node replacement
Involuntary disruptions are unexpected events outside your control:
- Node hardware failures
- Network partitions
- Cloud provider outages
- Resource exhaustion
PDBs only apply to voluntary disruptions. For involuntary ones, you need other mechanisms like high availability configurations and multiple availability zones.
How Pod Disruption Budgets Work
A Pod Disruption Budget defines the minimum number or percentage of pods that must remain available at any given time. Kubernetes uses this constraint to coordinate voluntary disruptions across the cluster.
When you initiate a node drain (e.g., kubectl drain), Kubernetes checks all PDBs that apply to pods on that node. If a PDB would be violated by draining the node, Kubernetes delays the eviction until another pod can be scheduled elsewhere to maintain the minimum availability.
PDB Types
There are two types of Pod Disruption Budgets:
Min Available: Ensures a minimum number of pods are always running.
In this example, at least 2 out of 3 replicas must always be available.
Max Unavailable: Ensures no more than a certain number of pods can be unavailable.
This ensures that at most 1 pod can be unavailable at any time.
Both types are equivalent when you know your replica count. For example, with 3 replicas:
minAvailable: 2is the same asmaxUnavailable: 1minAvailable: 1is the same asmaxUnavailable: 2
Comparing PDB Strategies
Different applications have different availability requirements. Here's how to choose the right PDB strategy:
| Application Type | PDB Strategy | Reasoning |
|---|---|---|
| Stateful applications | minAvailable: 1 | Guarantees at least one replica stays running |
| Stateless web apps | maxUnavailable: 1 | Allows gradual rolling updates |
| Critical services | minAvailable: 100% | No disruption allowed |
| High-traffic APIs | maxUnavailable: 0 | Zero downtime deployments |
| Batch jobs | No PDB needed | Jobs don't need continuous availability |
Implementing Pod Disruption Budgets
Let's walk through a practical example of implementing PDBs for a web application.
Step 1: Define Your Application Labels
First, ensure your pods have consistent labels that you can use to select them:
Step 2: Create the Pod Disruption Budget
Now create a PDB that ensures at least 2 pods remain available:
Apply this configuration with:
Step 3: Verify the PDB
Check that the PDB is created and enforced:
Output:
The ALLOWED DISRUPTIONS value shows how many pods can be evicted without violating the PDB. With 3 replicas and minAvailable: 2, you can safely evict 1 pod at a time.
Step 4: Test the PDB During Node Drain
Try to drain a node:
Kubernetes will wait until another pod is scheduled to replace the evicted pod before completing the drain. This ensures your application stays available throughout the process.
PDBs with Rolling Updates
PDBs work seamlessly with Kubernetes rolling updates. When you update your deployment, Kubernetes respects the PDB constraints during the rollout.
With maxUnavailable: 1 in both the deployment strategy and PDB, Kubernetes will:
- Evict 1 pod
- Wait for it to be replaced
- Evict the next pod
- Repeat until all pods are updated
This provides a controlled, gradual rollout with minimal disruption.
Common PDB Patterns
Pattern 1: High Availability for Critical Services
For services that cannot tolerate any downtime:
This ensures all pods remain running during any voluntary disruption.
Pattern 2: Gradual Rollout for Stateless Apps
For stateless applications that can handle some downtime:
This allows one pod to be unavailable at a time, which is ideal for rolling updates.
Pattern 3: Database with Read Replicas
For databases with read replicas, you might want to ensure the primary database is always available:
This ensures at least one primary database instance stays running during node maintenance.
PDB Limitations and Best Practices
Limitations
-
Only voluntary disruptions: PDBs don't protect against involuntary failures like node crashes or network partitions.
-
No guarantee during upgrades: PDBs don't prevent you from upgrading the Kubernetes control plane itself.
-
No cross-cluster coordination: PDBs work within a single cluster. For multi-cluster deployments, you need additional coordination.
-
Not enforced during scaling: PDBs don't apply when you scale replicas up or down manually.
Best Practices
-
Always define PDBs for production deployments: Even if you think your application can tolerate downtime, it's better to be safe.
-
Use meaningful labels: Ensure your PDB selector matches your pod labels exactly. A mismatch will cause the PDB to have no effect.
-
Test your PDBs: Before relying on PDBs in production, test them in a staging environment by simulating node drains.
-
Monitor PDB violations: Use tools like Prometheus and Grafana to monitor PDB status and detect violations.
-
Combine with other HA measures: PDBs are just one part of a high availability strategy. Use them alongside multiple availability zones, health checks, and monitoring.
Troubleshooting PDB Issues
Issue 1: PDB Not Enforced
If your PDB doesn't seem to be working, check:
Look for:
DisruptionsAllowedvalueCurrentandDesiredpod counts- Any events related to the PDB
Issue 2: Cannot Drain Node
If you can't drain a node due to PDB constraints:
Check the Conditions section for PodDisruptionBudgets status.
Issue 3: PDB Violations
If you see PDB violations in your logs:
Look for pods with Evicted status or check your monitoring system for PDB violation alerts.
Conclusion
Pod Disruption Budgets are a critical tool for maintaining high availability in Kubernetes clusters. They provide a simple yet powerful way to ensure your applications stay available during planned maintenance and upgrades.
The key takeaways are:
- PDBs only apply to voluntary disruptions, not hardware failures
- Use
minAvailablefor critical services that cannot tolerate downtime - Use
maxUnavailablefor stateless applications that can handle gradual rollouts - Always test your PDBs in a staging environment before production
- Combine PDBs with other high availability measures for a robust strategy
Platforms like ServerlessBase can help you manage your Kubernetes deployments and ensure your PDBs are properly configured across your applications. By automating the deployment process and providing built-in monitoring, ServerlessBase makes it easier to maintain high availability without manual intervention.
For more information on Kubernetes deployment strategies, check out the Kubernetes documentation.