ServerlessBase Blog
  • Understanding Kubernetes Rolling Updates and Rollbacks

    A practical guide to implementing safe deployment strategies in Kubernetes with rolling updates and rollback mechanisms

    Understanding Kubernetes Rolling Updates and Rollbacks

    You've just deployed your application to Kubernetes. The deployment controller spins up new pods, terminates old ones, and traffic flows to the updated version. But what happens when something goes wrong? How do you safely roll back to a previous version? This is where Kubernetes rolling updates and rollbacks become essential.

    What Are Rolling Updates?

    Rolling updates are a deployment strategy where Kubernetes gradually replaces old pods with new ones. Instead of a single big bang where all old pods are terminated and all new pods start simultaneously, the process happens incrementally.

    Think of it like a train leaving a station. Old pods exit one by one, and new pods enter one by one, maintaining continuous service throughout the transition. This ensures your application remains available during the update process.

    How Rolling Updates Work

    When you update a Deployment, Kubernetes performs these steps:

    1. Scale up the new ReplicaSet by creating new pods
    2. Wait for the new pods to become ready (pass health checks)
    3. Scale down the old ReplicaSet by terminating old pods
    4. Repeat until all old pods are replaced

    The number of pods replaced at once is controlled by the maxSurge and maxUnavailable parameters.

    Rolling Update Configuration

    maxSurge Parameter

    maxSurge defines how many additional pods can be created beyond the desired replica count during the update. This ensures you have extra capacity during the transition.

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: my-app
    spec:
      replicas: 3
      strategy:
        type: RollingUpdate
        rollingUpdate:
          maxSurge: 1        # Allow 1 extra pod during update
          maxUnavailable: 0  # No pods can be unavailable

    In this example, during an update, Kubernetes can have up to 4 pods (3 + 1) running, ensuring zero downtime.

    maxUnavailable Parameter

    maxUnavailable defines how many pods can be unavailable during the update. This is the number of old pods that can be terminated before new pods are created.

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: my-app
    spec:
      replicas: 3
      strategy:
        type: RollingUpdate
        rollingUpdate:
          maxSurge: 1
          maxUnavailable: 1  # Allow 1 pod to be unavailable

    Here, Kubernetes can terminate one old pod while creating one new pod, maintaining 2 running pods throughout the update.

    Common Configuration Patterns

    ScenariomaxSurgemaxUnavailableBehavior
    Zero downtime10New pods start before old ones terminate
    Fast update25%25%More pods updated simultaneously
    Conservative01Old pods terminate only after new ones are ready

    Rolling Update Example

    Let's walk through a concrete example. Suppose you have a Deployment with 3 replicas:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: my-app
    spec:
      replicas: 3
      selector:
        matchLabels:
          app: my-app
      template:
        metadata:
          labels:
            app: my-app
        spec:
          containers:
          - name: app
            image: nginx:1.21
            ports:
            - containerPort: 80

    You update the image to nginx:1.22. Here's what happens:

    1. Initial state: 3 pods running nginx:1.21
    2. Step 1: Create 1 new pod with nginx:1.22 (maxSurge: 1)
    3. Step 2: Wait for new pod to be ready
    4. Step 3: Terminate 1 old pod with nginx:1.21
    5. Step 4: Repeat until all 3 pods run nginx:1.22

    The update completes in 3 steps, with 2 pods running at all times.

    Health Checks During Rolling Updates

    Health checks are critical for rolling updates. Kubernetes only considers a pod ready when all its containers are ready and pass their health checks.

    Readiness Probes

    Readiness probes determine when a pod is ready to receive traffic. They're essential for rolling updates because they prevent traffic from being sent to pods that aren't ready.

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: my-app
    spec:
      replicas: 3
      strategy:
        type: RollingUpdate
        rollingUpdate:
          maxSurge: 1
          maxUnavailable: 0
      template:
        spec:
          containers:
          - name: app
            image: nginx:1.22
            ports:
            - containerPort: 80
            readinessProbe:
              httpGet:
                path: /health
                port: 80
              initialDelaySeconds: 10
              periodSeconds: 5

    Liveness Probes

    Liveness probes determine if a container is running. If a liveness probe fails, Kubernetes restarts the container. This is different from readiness probes, which only affect traffic routing.

    livenessProbe:
      httpGet:
        path: /health
        port: 80
      initialDelaySeconds: 30
      periodSeconds: 10
      failureThreshold: 3

    If the liveness probe fails 3 times in 30 seconds, Kubernetes restarts the container.

    Rollback Mechanisms

    Despite careful planning, things go wrong. Kubernetes provides built-in rollback capabilities to revert to previous Deployment versions.

    Automatic Rollback on Failure

    If a rolling update fails, Kubernetes automatically rolls back to the previous stable version. A failure occurs when:

    • A new pod fails to become ready after the timeout period
    • The Deployment reaches the progressDeadlineSeconds limit
    • The update is manually cancelled
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: my-app
    spec:
      progressDeadlineSeconds: 600  # Allow 10 minutes for update

    Manual Rollback Commands

    You can manually roll back using kubectl:

    # Rollback to the previous version
    kubectl rollout undo deployment/my-app
     
    # Rollback to a specific revision
    kubectl rollout undo deployment/my-app --to-revision=2

    Checking Rollback History

    # View deployment history
    kubectl rollout history deployment/my-app
     
    # View details of a specific revision
    kubectl rollout history deployment/my-app --revision=2

    Rollback Example

    Suppose you've deployed version 1.22, but it has a critical bug. Here's how to roll back:

    # Check current deployment status
    kubectl rollout status deployment/my-app
     
    # View deployment history
    kubectl rollout history deployment/my-app
     
    # Rollback to previous version
    kubectl rollout undo deployment/my-app
     
    # Verify rollback
    kubectl rollout status deployment/my-app

    Kubernetes will automatically create a new ReplicaSet with the previous image version and perform a rolling update to restore the previous state.

    Deployment Strategies Comparison

    Different deployment strategies serve different use cases. Here's how rolling updates compare to other strategies:

    StrategyDescriptionUse CaseDowntime
    Rolling UpdateGradual pod replacementGeneral purpose, zero downtimeNone
    RecreateStop all pods, start new onesSimple deployments, no health checksTemporary
    Blue-GreenTwo identical environmentsHigh-risk changes, canary testingMinimal
    CanaryGradual traffic shiftFeature flags, gradual rolloutMinimal

    Rolling Update vs Recreate

    # Rolling update (recommended)
    strategy:
      type: RollingUpdate
      rollingUpdate:
        maxSurge: 1
        maxUnavailable: 0
     
    # Recreate (not recommended for production)
    strategy:
      type: Recreate

    Rolling updates maintain service availability, while recreate stops all pods during the update.

    Best Practices for Rolling Updates

    1. Use Conservative maxSurge and maxUnavailable Values

    Start with conservative values to ensure stability:

    strategy:
      type: RollingUpdate
      rollingUpdate:
        maxSurge: 1      # Start with 1 extra pod
        maxUnavailable: 1  # Allow 1 pod to be unavailable

    Gradually increase these values as you gain confidence in your deployment process.

    2. Configure Health Checks Properly

    Always define readiness and liveness probes:

    readinessProbe:
      httpGet:
        path: /health
        port: 80
      initialDelaySeconds: 10
      periodSeconds: 5
      timeoutSeconds: 2
      failureThreshold: 3
     
    livenessProbe:
      httpGet:
        path: /health
        port: 80
      initialDelaySeconds: 30
      periodSeconds: 10
      failureThreshold: 3

    3. Set Progress Deadline

    Allow sufficient time for updates to complete:

    progressDeadlineSeconds: 600  # 10 minutes

    4. Test Updates in Non-Production Environments

    Always test rolling updates in staging before production:

    # Update staging deployment
    kubectl set image deployment/my-app app=nginx:1.22 --namespace=staging
     
    # Monitor rollout
    kubectl rollout status deployment/my-app --namespace=staging
     
    # Rollback if needed
    kubectl rollout undo deployment/my-app --namespace=staging

    5. Monitor Rollout Progress

    Watch the rollout status during updates:

    kubectl rollout status deployment/my-app
     
    # Watch in real-time
    kubectl rollout status deployment/my-app -w

    6. Use Deployment Annotations for Tracking

    Add annotations to track deployment metadata:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: my-app
      annotations:
        deployment.kubernetes.io/revision: "2"
        description: "Updated to nginx:1.22 for performance improvements"

    Troubleshooting Rolling Updates

    Update Stuck in Progress

    If an update appears stuck:

    # Check deployment status
    kubectl describe deployment my-app
     
    # Check events
    kubectl get events --sort-by='.lastTimestamp'
     
    # Manually rollback if needed
    kubectl rollout undo deployment/my-app

    New Pods Not Starting

    Check pod events:

    kubectl describe pod <pod-name>
     
    # Check container logs
    kubectl logs <pod-name>

    Health Check Failures

    Verify health check endpoints:

    # Test health endpoint manually
    kubectl exec -it <pod-name> -- curl http://localhost:80/health
     
    # Check readiness probe status
    kubectl get pod <pod-name> -o jsonpath='{.status.conditions[?(@.type=="Ready")].status}'

    Advanced Rolling Update Techniques

    Rolling Back with Pause

    You can pause a rollout to inspect the current state:

    # Pause rollout
    kubectl rollout pause deployment/my-app
     
    # Make changes if needed
    kubectl set image deployment/my-app app=nginx:1.23
     
    # Resume rollout
    kubectl rollout resume deployment/my-app

    Rolling Updates with Multiple Container Images

    For multi-container pods, specify which container to update:

    kubectl set image deployment/my-app \
      app=nginx:1.22 \
      sidecar=busybox:1.35

    Rolling Updates with ConfigMaps

    Update configuration without changing the image:

    # Update configmap
    kubectl create configmap app-config --from-file=config.yaml --dry-run=client -o yaml | kubectl apply -f -
     
    # Rolling update will pick up new config
    kubectl rollout restart deployment/my-app

    Monitoring Rolling Updates

    Track Deployment Status

    # Get deployment status
    kubectl get deployment my-app -o jsonpath='{.status.conditions[?(@.type=="Progressing")].message}'
     
    # Check replica counts
    kubectl get deployment my-app -o jsonpath='{.status.replicas} {.status.updatedReplicas} {.status.availableReplicas}'

    Set Up Alerts

    Create alerts for deployment failures:

    apiVersion: monitoring.coreos.com/v1
    kind: PrometheusRule
    metadata:
      name: deployment-failure-alerts
    spec:
      groups:
      - name: deployments
        rules:
        - alert: DeploymentFailed
          expr: kube_deployment_status_replicas_unavailable > 0
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "Deployment {{ $labels.deployment }} has unavailable replicas"

    Conclusion

    Rolling updates and rollbacks are fundamental to safe Kubernetes deployments. By understanding how rolling updates work, configuring health checks properly, and knowing how to rollback when things go wrong, you can confidently deploy changes to production with minimal risk.

    Remember these key points:

    • Rolling updates replace pods gradually, maintaining service availability
    • Configure maxSurge and maxUnavailable to control the update pace
    • Always define readiness and liveness probes
    • Use manual rollback commands when automatic rollback isn't sufficient
    • Monitor rollout progress and set up alerts for failures

    Platforms like ServerlessBase simplify deployment management by handling reverse proxy configuration and SSL certificate provisioning automatically, so you can focus on implementing robust rolling update strategies for your applications.

    The next step is to implement these patterns in your own Kubernetes deployments. Start with conservative configuration values, test thoroughly in staging, and gradually increase your update pace as you gain confidence in your deployment process.

    Leave comment