ServerlessBase Blog
  • Canary Deployments with Kubernetes

    A practical guide to implementing canary releases in Kubernetes for safer deployments

    Canary Deployments with Kubernetes

    You've just pushed a new version of your application to production. The code looks good, the tests pass, and you're confident it will work. But what if something goes wrong? A single bug in the new version could impact all your users immediately. That's where canary deployments come in.

    Canary deployments let you release new code to a small subset of users first, monitor its behavior, and only roll it out to everyone if it performs well. This approach dramatically reduces the risk of production incidents while still giving you the speed of continuous deployment.

    What is a Canary Deployment?

    A canary deployment is a release strategy where you deploy a new version of your application to a small, controlled group of users before making it available to everyone. Think of it as a test run in production.

    The name comes from the canary in a coal mine—historically, miners would bring canaries into coal mines to detect dangerous gas levels. If the canary died, the miners knew to evacuate. Similarly, if your canary deployment shows problems, you can roll back before the new version affects all users.

    How It Works

    1. Deploy the new version to a small number of instances (often just one pod)
    2. Route a percentage of traffic to the canary version
    3. Monitor metrics like error rates, latency, and user feedback
    4. If everything looks good, gradually increase traffic to the canary
    5. If issues appear, roll back immediately and investigate

    This approach gives you the safety of a staging environment with the real-world data of production.

    Comparison: Deployment Strategies

    StrategyTraffic DistributionRollback SpeedRiskBest For
    Blue-Green50/50 or 100% to newInstantMediumSimple applications, no shared state
    CanaryGradual (1% → 100%)FastLowComplex applications, gradual rollout
    RollingGradual (0% → 100%)MediumMediumSimple applications, no complex routing
    A/B TestingSegment-basedMediumLowFeature testing, UX experiments

    Kubernetes Canary Deployment Patterns

    Kubernetes provides several ways to implement canary deployments. Let's explore the most common approaches.

    1. Using Traffic Splitting with Ingress

    The most straightforward way to implement a canary deployment is by splitting traffic between your stable and canary versions using an Ingress controller.

    # stable deployment
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: myapp-stable
    spec:
      replicas: 3
      selector:
        matchLabels:
          app: myapp
          version: stable
      template:
        metadata:
          labels:
            app: myapp
            version: stable
        spec:
          containers:
          - name: myapp
            image: myapp:1.0.0
            ports:
            - containerPort: 80
    # canary deployment
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: myapp-canary
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: myapp
          version: canary
      template:
        metadata:
          labels:
            app: myapp
            version: canary
        spec:
          containers:
          - name: myapp
            image: myapp:1.1.0
            ports:
            - containerPort: 80
    # ingress with traffic split
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      name: myapp-ingress
      annotations:
        nginx.ingress.kubernetes.io/canary: "true"
        nginx.ingress.kubernetes.io/canary-weight: "10"
    spec:
      rules:
      - host: myapp.example.com
        http:
          paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: myapp
                port:
                  number: 80

    The nginx.ingress.kubernetes.io/canary-weight annotation controls how much traffic goes to the canary version. A weight of 10 means 10% of traffic goes to the canary, 90% goes to stable.

    To gradually increase traffic, update the annotation:

    # Increase to 25%
    kubectl patch ingress myapp-ingress -p '{"spec":{"annotations":{"nginx.ingress.kubernetes.io/canary-weight":"25"}}}'
     
    # Increase to 50%
    kubectl patch ingress myapp-ingress -p '{"spec":{"annotations":{"nginx.ingress.kubernetes.io/canary-weight":"50"}}}'
     
    # Increase to 100% (full rollout)
    kubectl patch ingress myapp-ingress -p '{"spec":{"annotations":{"nginx.ingress.kubernetes.io/canary-weight":"100"}}}'

    2. Using Service Selectors

    Another approach is to use Kubernetes services with different selectors to route traffic to different versions.

    # stable service
    apiVersion: v1
    kind: Service
    metadata:
      name: myapp-stable
    spec:
      selector:
        app: myapp
        version: stable
      ports:
      - port: 80
        targetPort: 80
    # canary service
    apiVersion: v1
    kind: Service
    metadata:
      name: myapp-canary
    spec:
      selector:
        app: myapp
        version: canary
      ports:
      - port: 80
        targetPort: 80
    # canary ingress
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      name: myapp-canary-ingress
      annotations:
        nginx.ingress.kubernetes.io/canary: "true"
        nginx.ingress.kubernetes.io/canary-weight: "10"
    spec:
      rules:
      - host: canary.myapp.example.com
        http:
          paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: myapp-canary
                port:
                  number: 80

    This approach uses a separate hostname for the canary deployment, which can be useful for testing with real users before exposing it to everyone.

    3. Using Traffic Management with Istio

    For more advanced traffic management, service meshes like Istio provide powerful canary deployment capabilities.

    # VirtualService for canary traffic split
    apiVersion: networking.istio.io/v1alpha3
    kind: VirtualService
    metadata:
      name: myapp
    spec:
      hosts:
      - myapp.example.com
      http:
      - match:
        - headers:
            canary:
              exact: "true"
        route:
        - destination:
            host: myapp
            subset: canary
          weight: 100
      - route:
        - destination:
            host: myapp
            subset: stable
          weight: 90
        - destination:
            host: myapp
            subset: canary
          weight: 10
    # DestinationRule for subsets
    apiVersion: networking.istio.io/v1alpha3
    kind: DestinationRule
    metadata:
      name: myapp
    spec:
      host: myapp
      subsets:
      - name: stable
        labels:
          version: stable
      - name: canary
        labels:
          version: canary

    Istio provides more granular control over traffic splitting, including header-based routing, weighted routing, and advanced traffic shifting strategies.

    Monitoring Your Canary Deployment

    Monitoring is critical for successful canary deployments. You need to track both application-level and infrastructure-level metrics.

    Key Metrics to Monitor

    MetricWhy It MattersAlert Threshold
    Error RateDetects bugs in the new version> 5% increase from baseline
    LatencyIdentifies performance regressions> 20% increase from baseline
    ThroughputEnsures the canary can handle load< 80% of stable version
    CPU/Memory UsageChecks resource efficiency> 20% increase from baseline
    Custom Business MetricsValidates business logic changesAny unexpected change

    Setting Up Monitoring

    # Prometheus ServiceMonitor for canary
    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      name: myapp-canary
      labels:
        release: prometheus
    spec:
      selector:
        matchLabels:
          app: myapp
          version: canary
      endpoints:
      - port: http
        interval: 15s
        path: /metrics

    Configure your alerting rules to compare canary metrics against stable baselines. This ensures you're detecting real issues, not just normal variance.

    Rollback Strategy

    Despite your best efforts, things can go wrong. Having a clear rollback strategy is essential.

    Automated Rollback

    You can implement automated rollbacks based on metrics:

    # Prometheus alert rule for automatic rollback
    apiGroups:
    - monitoring.coreos.com
    resources:
    - alertmanagers
    verbs:
    - get
    - list
    - watch
    # Example alert rule
    groups:
    - name: canary-alerts
      rules:
      - alert: CanaryHighErrorRate
        expr: |
          rate(http_requests_total{version="canary",status=~"5.."}[5m])
          /
          rate(http_requests_total{version="canary"}[5m]) > 0.05
        for: 5m
        annotations:
          summary: "Canary has high error rate"
          description: "Canary error rate is {{ $value | humanizePercentage }}"

    When this alert fires, you can automatically scale down the canary deployment and increase traffic back to stable.

    Manual Rollback

    For more control, you can manually roll back:

    # Scale down canary to zero
    kubectl scale deployment myapp-canary --replicas=0
     
    # Increase stable traffic to 100%
    kubectl patch ingress myapp-ingress -p '{"spec":{"annotations":{"nginx.ingress.kubernetes.io/canary-weight":"100"}}}'

    This approach gives you time to investigate the issue before taking action.

    Best Practices

    1. Start Small

    Begin with a very small traffic percentage (1-5%) and gradually increase. This gives you time to detect issues early.

    2. Monitor for Sufficient Time

    Don't rush the rollout. Monitor each stage for at least 15-30 minutes before increasing traffic. The actual time depends on your application's characteristics.

    3. Use Feature Flags

    Combine canary deployments with feature flags for even more granular control. This lets you enable features for specific users or segments.

    4. Document Your Rollback Plan

    Create a clear rollback procedure and share it with your team. Include steps for both manual and automated rollbacks.

    5. Learn from Rollbacks

    Every rollback is an opportunity to learn. Investigate the root cause and implement fixes to prevent similar issues in the future.

    Common Pitfalls

    1. Ignoring Database Changes

    If your canary deployment changes database schemas or queries, ensure your monitoring catches performance issues early. Database changes can silently degrade performance.

    2. Overlooking Third-Party Dependencies

    Canary deployments can expose issues with external services or dependencies. Monitor integration points carefully.

    3. Neglecting Rollback Testing

    Test your rollback procedure before you need it. Nothing is worse than trying to roll back during an incident.

    4. Rushing the Rollout

    Speed is important, but rushing increases risk. Take the time to do it right.

    Conclusion

    Canary deployments are a powerful technique for reducing deployment risk while maintaining fast release cycles. By gradually rolling out new code to a small subset of users, you can catch issues early and prevent widespread impact.

    The key to successful canary deployments is a combination of proper implementation, thorough monitoring, and a clear rollback strategy. Start small, monitor closely, and only roll out to everyone when you're confident the new version is ready.

    Platforms like ServerlessBase simplify canary deployments by providing built-in traffic splitting and monitoring, so you can focus on releasing great software without worrying about the infrastructure details.

    Next Steps

    1. Implement a basic canary deployment using the Ingress-based approach
    2. Set up monitoring for your canary deployment
    3. Create a rollback procedure and test it
    4. Gradually increase traffic and monitor closely
    5. Document your process and share with your team

    With these steps, you'll be well on your way to safer, faster deployments with minimal risk to your users.

    Leave comment