Understanding Kubernetes Rolling Updates and Rollbacks

You've just deployed your application to Kubernetes. The deployment controller spins up new pods, terminates old ones, and traffic flows to the updated version. But what happens when something goes wrong? How do you safely roll back to a previous version? This is where Kubernetes rolling updates and rollbacks become essential.

What Are Rolling Updates?

Rolling updates are a deployment strategy where Kubernetes gradually replaces old pods with new ones. Instead of a single big bang where all old pods are terminated and all new pods start simultaneously, the process happens incrementally.

Think of it like a train leaving a station. Old pods exit one by one, and new pods enter one by one, maintaining continuous service throughout the transition. This ensures your application remains available during the update process.

How Rolling Updates Work

When you update a Deployment, Kubernetes performs these steps:

Scale up the new ReplicaSet by creating new pods
Wait for the new pods to become ready (pass health checks)
Scale down the old ReplicaSet by terminating old pods
Repeat until all old pods are replaced

The number of pods replaced at once is controlled by the maxSurge and maxUnavailable parameters.

Rolling Update Configuration

maxSurge Parameter

maxSurge defines how many additional pods can be created beyond the desired replica count during the update. This ensures you have extra capacity during the transition.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1        # Allow 1 extra pod during update
      maxUnavailable: 0  # No pods can be unavailable

In this example, during an update, Kubernetes can have up to 4 pods (3 + 1) running, ensuring zero downtime.

maxUnavailable Parameter

maxUnavailable defines how many pods can be unavailable during the update. This is the number of old pods that can be terminated before new pods are created.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1  # Allow 1 pod to be unavailable

Here, Kubernetes can terminate one old pod while creating one new pod, maintaining 2 running pods throughout the update.

Common Configuration Patterns

Scenario	maxSurge	maxUnavailable	Behavior
Zero downtime	1	0	New pods start before old ones terminate
Fast update	25%	25%	More pods updated simultaneously
Conservative	0	1	Old pods terminate only after new ones are ready

Rolling Update Example

Let's walk through a concrete example. Suppose you have a Deployment with 3 replicas:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: app
        image: nginx:1.21
        ports:
        - containerPort: 80

You update the image to nginx:1.22. Here's what happens:

Initial state: 3 pods running nginx:1.21
Step 1: Create 1 new pod with nginx:1.22 (maxSurge: 1)
Step 2: Wait for new pod to be ready
Step 3: Terminate 1 old pod with nginx:1.21
Step 4: Repeat until all 3 pods run nginx:1.22

The update completes in 3 steps, with 2 pods running at all times.

Health Checks During Rolling Updates

Health checks are critical for rolling updates. Kubernetes only considers a pod ready when all its containers are ready and pass their health checks.

Readiness Probes

Readiness probes determine when a pod is ready to receive traffic. They're essential for rolling updates because they prevent traffic from being sent to pods that aren't ready.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  template:
    spec:
      containers:
      - name: app
        image: nginx:1.22
        ports:
        - containerPort: 80
        readinessProbe:
          httpGet:
            path: /health
            port: 80
          initialDelaySeconds: 10
          periodSeconds: 5

Liveness Probes

Liveness probes determine if a container is running. If a liveness probe fails, Kubernetes restarts the container. This is different from readiness probes, which only affect traffic routing.

livenessProbe:
  httpGet:
    path: /health
    port: 80
  initialDelaySeconds: 30
  periodSeconds: 10
  failureThreshold: 3

If the liveness probe fails 3 times in 30 seconds, Kubernetes restarts the container.

Rollback Mechanisms

Despite careful planning, things go wrong. Kubernetes provides built-in rollback capabilities to revert to previous Deployment versions.

Automatic Rollback on Failure

If a rolling update fails, Kubernetes automatically rolls back to the previous stable version. A failure occurs when:

A new pod fails to become ready after the timeout period
The Deployment reaches the progressDeadlineSeconds limit
The update is manually cancelled

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  progressDeadlineSeconds: 600  # Allow 10 minutes for update

Manual Rollback Commands

You can manually roll back using kubectl:

# Rollback to the previous version
kubectl rollout undo deployment/my-app
 
# Rollback to a specific revision
kubectl rollout undo deployment/my-app --to-revision=2

Checking Rollback History

# View deployment history
kubectl rollout history deployment/my-app
 
# View details of a specific revision
kubectl rollout history deployment/my-app --revision=2

Rollback Example

Suppose you've deployed version 1.22, but it has a critical bug. Here's how to roll back:

# Check current deployment status
kubectl rollout status deployment/my-app
 
# View deployment history
kubectl rollout history deployment/my-app
 
# Rollback to previous version
kubectl rollout undo deployment/my-app
 
# Verify rollback
kubectl rollout status deployment/my-app

Kubernetes will automatically create a new ReplicaSet with the previous image version and perform a rolling update to restore the previous state.

Deployment Strategies Comparison

Different deployment strategies serve different use cases. Here's how rolling updates compare to other strategies:

Strategy	Description	Use Case	Downtime
Rolling Update	Gradual pod replacement	General purpose, zero downtime	None
Recreate	Stop all pods, start new ones	Simple deployments, no health checks	Temporary
Blue-Green	Two identical environments	High-risk changes, canary testing	Minimal
Canary	Gradual traffic shift	Feature flags, gradual rollout	Minimal

Rolling Update vs Recreate

# Rolling update (recommended)
strategy:
  type: RollingUpdate
  rollingUpdate:
    maxSurge: 1
    maxUnavailable: 0
 
# Recreate (not recommended for production)
strategy:
  type: Recreate

Rolling updates maintain service availability, while recreate stops all pods during the update.

Best Practices for Rolling Updates

1. Use Conservative maxSurge and maxUnavailable Values

Start with conservative values to ensure stability:

strategy:
  type: RollingUpdate
  rollingUpdate:
    maxSurge: 1      # Start with 1 extra pod
    maxUnavailable: 1  # Allow 1 pod to be unavailable

Gradually increase these values as you gain confidence in your deployment process.

2. Configure Health Checks Properly

Always define readiness and liveness probes:

readinessProbe:
  httpGet:
    path: /health
    port: 80
  initialDelaySeconds: 10
  periodSeconds: 5
  timeoutSeconds: 2
  failureThreshold: 3
 
livenessProbe:
  httpGet:
    path: /health
    port: 80
  initialDelaySeconds: 30
  periodSeconds: 10
  failureThreshold: 3

3. Set Progress Deadline

Allow sufficient time for updates to complete:

progressDeadlineSeconds: 600  # 10 minutes

4. Test Updates in Non-Production Environments

Always test rolling updates in staging before production:

# Update staging deployment
kubectl set image deployment/my-app app=nginx:1.22 --namespace=staging
 
# Monitor rollout
kubectl rollout status deployment/my-app --namespace=staging
 
# Rollback if needed
kubectl rollout undo deployment/my-app --namespace=staging

5. Monitor Rollout Progress

Watch the rollout status during updates:

kubectl rollout status deployment/my-app
 
# Watch in real-time
kubectl rollout status deployment/my-app -w

6. Use Deployment Annotations for Tracking

Add annotations to track deployment metadata:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  annotations:
    deployment.kubernetes.io/revision: "2"
    description: "Updated to nginx:1.22 for performance improvements"

Troubleshooting Rolling Updates

Update Stuck in Progress

If an update appears stuck:

# Check deployment status
kubectl describe deployment my-app
 
# Check events
kubectl get events --sort-by='.lastTimestamp'
 
# Manually rollback if needed
kubectl rollout undo deployment/my-app

New Pods Not Starting

Check pod events:

kubectl describe pod <pod-name>
 
# Check container logs
kubectl logs <pod-name>

Health Check Failures

Verify health check endpoints:

# Test health endpoint manually
kubectl exec -it <pod-name> -- curl http://localhost:80/health
 
# Check readiness probe status
kubectl get pod <pod-name> -o jsonpath='{.status.conditions[?(@.type=="Ready")].status}'

Advanced Rolling Update Techniques

Rolling Back with Pause

You can pause a rollout to inspect the current state:

# Pause rollout
kubectl rollout pause deployment/my-app
 
# Make changes if needed
kubectl set image deployment/my-app app=nginx:1.23
 
# Resume rollout
kubectl rollout resume deployment/my-app

Rolling Updates with Multiple Container Images

For multi-container pods, specify which container to update:

kubectl set image deployment/my-app \
  app=nginx:1.22 \
  sidecar=busybox:1.35

Rolling Updates with ConfigMaps

Update configuration without changing the image:

# Update configmap
kubectl create configmap app-config --from-file=config.yaml --dry-run=client -o yaml | kubectl apply -f -
 
# Rolling update will pick up new config
kubectl rollout restart deployment/my-app

Monitoring Rolling Updates

Track Deployment Status

# Get deployment status
kubectl get deployment my-app -o jsonpath='{.status.conditions[?(@.type=="Progressing")].message}'
 
# Check replica counts
kubectl get deployment my-app -o jsonpath='{.status.replicas} {.status.updatedReplicas} {.status.availableReplicas}'

Set Up Alerts

Create alerts for deployment failures:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: deployment-failure-alerts
spec:
  groups:
  - name: deployments
    rules:
    - alert: DeploymentFailed
      expr: kube_deployment_status_replicas_unavailable > 0
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Deployment {{ $labels.deployment }} has unavailable replicas"

Conclusion

Rolling updates and rollbacks are fundamental to safe Kubernetes deployments. By understanding how rolling updates work, configuring health checks properly, and knowing how to rollback when things go wrong, you can confidently deploy changes to production with minimal risk.

Remember these key points:

Rolling updates replace pods gradually, maintaining service availability
Configure maxSurge and maxUnavailable to control the update pace
Always define readiness and liveness probes
Use manual rollback commands when automatic rollback isn't sufficient
Monitor rollout progress and set up alerts for failures

Platforms like ServerlessBase simplify deployment management by handling reverse proxy configuration and SSL certificate provisioning automatically, so you can focus on implementing robust rolling update strategies for your applications.

The next step is to implement these patterns in your own Kubernetes deployments. Start with conservative configuration values, test thoroughly in staging, and gradually increase your update pace as you gain confidence in your deployment process.