Canary Deployments with Kubernetes

You've just pushed a new version of your application to production. The code looks good, the tests pass, and you're confident it will work. But what if something goes wrong? A single bug in the new version could impact all your users immediately. That's where canary deployments come in.

Canary deployments let you release new code to a small subset of users first, monitor its behavior, and only roll it out to everyone if it performs well. This approach dramatically reduces the risk of production incidents while still giving you the speed of continuous deployment.

What is a Canary Deployment?

A canary deployment is a release strategy where you deploy a new version of your application to a small, controlled group of users before making it available to everyone. Think of it as a test run in production.

The name comes from the canary in a coal mine—historically, miners would bring canaries into coal mines to detect dangerous gas levels. If the canary died, the miners knew to evacuate. Similarly, if your canary deployment shows problems, you can roll back before the new version affects all users.

How It Works

Deploy the new version to a small number of instances (often just one pod)
Route a percentage of traffic to the canary version
Monitor metrics like error rates, latency, and user feedback
If everything looks good, gradually increase traffic to the canary
If issues appear, roll back immediately and investigate

This approach gives you the safety of a staging environment with the real-world data of production.

Comparison: Deployment Strategies

Strategy	Traffic Distribution	Rollback Speed	Risk	Best For
Blue-Green	50/50 or 100% to new	Instant	Medium	Simple applications, no shared state
Canary	Gradual (1% → 100%)	Fast	Low	Complex applications, gradual rollout
Rolling	Gradual (0% → 100%)	Medium	Medium	Simple applications, no complex routing
A/B Testing	Segment-based	Medium	Low	Feature testing, UX experiments

Kubernetes Canary Deployment Patterns

Kubernetes provides several ways to implement canary deployments. Let's explore the most common approaches.

1. Using Traffic Splitting with Ingress

The most straightforward way to implement a canary deployment is by splitting traffic between your stable and canary versions using an Ingress controller.

# stable deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-stable
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      version: stable
  template:
    metadata:
      labels:
        app: myapp
        version: stable
    spec:
      containers:
      - name: myapp
        image: myapp:1.0.0
        ports:
        - containerPort: 80

# canary deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-canary
spec:
  replicas: 1
  selector:
    matchLabels:
      app: myapp
      version: canary
  template:
    metadata:
      labels:
        app: myapp
        version: canary
    spec:
      containers:
      - name: myapp
        image: myapp:1.1.0
        ports:
        - containerPort: 80

# ingress with traffic split
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp-ingress
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-weight: "10"
spec:
  rules:
  - host: myapp.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: myapp
            port:
              number: 80

The nginx.ingress.kubernetes.io/canary-weight annotation controls how much traffic goes to the canary version. A weight of 10 means 10% of traffic goes to the canary, 90% goes to stable.

To gradually increase traffic, update the annotation:

# Increase to 25%
kubectl patch ingress myapp-ingress -p '{"spec":{"annotations":{"nginx.ingress.kubernetes.io/canary-weight":"25"}}}'
 
# Increase to 50%
kubectl patch ingress myapp-ingress -p '{"spec":{"annotations":{"nginx.ingress.kubernetes.io/canary-weight":"50"}}}'
 
# Increase to 100% (full rollout)
kubectl patch ingress myapp-ingress -p '{"spec":{"annotations":{"nginx.ingress.kubernetes.io/canary-weight":"100"}}}'

2. Using Service Selectors

Another approach is to use Kubernetes services with different selectors to route traffic to different versions.

# stable service
apiVersion: v1
kind: Service
metadata:
  name: myapp-stable
spec:
  selector:
    app: myapp
    version: stable
  ports:
  - port: 80
    targetPort: 80

# canary service
apiVersion: v1
kind: Service
metadata:
  name: myapp-canary
spec:
  selector:
    app: myapp
    version: canary
  ports:
  - port: 80
    targetPort: 80

# canary ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp-canary-ingress
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-weight: "10"
spec:
  rules:
  - host: canary.myapp.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: myapp-canary
            port:
              number: 80

This approach uses a separate hostname for the canary deployment, which can be useful for testing with real users before exposing it to everyone.

3. Using Traffic Management with Istio

For more advanced traffic management, service meshes like Istio provide powerful canary deployment capabilities.

# VirtualService for canary traffic split
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: myapp
spec:
  hosts:
  - myapp.example.com
  http:
  - match:
    - headers:
        canary:
          exact: "true"
    route:
    - destination:
        host: myapp
        subset: canary
      weight: 100
  - route:
    - destination:
        host: myapp
        subset: stable
      weight: 90
    - destination:
        host: myapp
        subset: canary
      weight: 10

# DestinationRule for subsets
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: myapp
spec:
  host: myapp
  subsets:
  - name: stable
    labels:
      version: stable
  - name: canary
    labels:
      version: canary

Istio provides more granular control over traffic splitting, including header-based routing, weighted routing, and advanced traffic shifting strategies.

Monitoring Your Canary Deployment

Monitoring is critical for successful canary deployments. You need to track both application-level and infrastructure-level metrics.

Key Metrics to Monitor

Metric	Why It Matters	Alert Threshold
Error Rate	Detects bugs in the new version	> 5% increase from baseline
Latency	Identifies performance regressions	> 20% increase from baseline
Throughput	Ensures the canary can handle load	< 80% of stable version
CPU/Memory Usage	Checks resource efficiency	> 20% increase from baseline
Custom Business Metrics	Validates business logic changes	Any unexpected change

Setting Up Monitoring

# Prometheus ServiceMonitor for canary
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: myapp-canary
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app: myapp
      version: canary
  endpoints:
  - port: http
    interval: 15s
    path: /metrics

Configure your alerting rules to compare canary metrics against stable baselines. This ensures you're detecting real issues, not just normal variance.

Rollback Strategy

Despite your best efforts, things can go wrong. Having a clear rollback strategy is essential.

Automated Rollback

You can implement automated rollbacks based on metrics:

# Prometheus alert rule for automatic rollback
apiGroups:
- monitoring.coreos.com
resources:
- alertmanagers
verbs:
- get
- list
- watch

# Example alert rule
groups:
- name: canary-alerts
  rules:
  - alert: CanaryHighErrorRate
    expr: |
      rate(http_requests_total{version="canary",status=~"5.."}[5m])
      /
      rate(http_requests_total{version="canary"}[5m]) > 0.05
    for: 5m
    annotations:
      summary: "Canary has high error rate"
      description: "Canary error rate is {{ $value | humanizePercentage }}"

When this alert fires, you can automatically scale down the canary deployment and increase traffic back to stable.

Manual Rollback

For more control, you can manually roll back:

# Scale down canary to zero
kubectl scale deployment myapp-canary --replicas=0
 
# Increase stable traffic to 100%
kubectl patch ingress myapp-ingress -p '{"spec":{"annotations":{"nginx.ingress.kubernetes.io/canary-weight":"100"}}}'

This approach gives you time to investigate the issue before taking action.

Best Practices

1. Start Small

Begin with a very small traffic percentage (1-5%) and gradually increase. This gives you time to detect issues early.

2. Monitor for Sufficient Time

Don't rush the rollout. Monitor each stage for at least 15-30 minutes before increasing traffic. The actual time depends on your application's characteristics.

3. Use Feature Flags

Combine canary deployments with feature flags for even more granular control. This lets you enable features for specific users or segments.

4. Document Your Rollback Plan

Create a clear rollback procedure and share it with your team. Include steps for both manual and automated rollbacks.

5. Learn from Rollbacks

Every rollback is an opportunity to learn. Investigate the root cause and implement fixes to prevent similar issues in the future.

Common Pitfalls

1. Ignoring Database Changes

If your canary deployment changes database schemas or queries, ensure your monitoring catches performance issues early. Database changes can silently degrade performance.

2. Overlooking Third-Party Dependencies

Canary deployments can expose issues with external services or dependencies. Monitor integration points carefully.

3. Neglecting Rollback Testing

Test your rollback procedure before you need it. Nothing is worse than trying to roll back during an incident.

4. Rushing the Rollout

Speed is important, but rushing increases risk. Take the time to do it right.

Conclusion

Canary deployments are a powerful technique for reducing deployment risk while maintaining fast release cycles. By gradually rolling out new code to a small subset of users, you can catch issues early and prevent widespread impact.

The key to successful canary deployments is a combination of proper implementation, thorough monitoring, and a clear rollback strategy. Start small, monitor closely, and only roll out to everyone when you're confident the new version is ready.

Platforms like ServerlessBase simplify canary deployments by providing built-in traffic splitting and monitoring, so you can focus on releasing great software without worrying about the infrastructure details.

Next Steps

Implement a basic canary deployment using the Ingress-based approach
Set up monitoring for your canary deployment
Create a rollback procedure and test it
Gradually increase traffic and monitor closely
Document your process and share with your team

With these steps, you'll be well on your way to safer, faster deployments with minimal risk to your users.