Introduction to Kubernetes Jobs and CronJobs

You've probably deployed a web application that runs continuously, but Kubernetes also has powerful features for running one-time tasks and scheduled jobs. When you need to process a batch of data, run a nightly backup, or execute a recurring task, Kubernetes Jobs and CronJobs are exactly what you need.

What Are Jobs?

A Kubernetes Job is a controller that creates one or more Pods and ensures that a specified number of them successfully complete. Once the specified number of successful completions is reached, the Job is finished.

Think of a Job as a task that runs to completion. Unlike a Deployment that keeps your application running indefinitely, a Job runs once (or a specified number of times) and then stops.

Job Lifecycle

When you create a Job, Kubernetes does the following:

Creates Pods: The Job controller creates the specified number of Pods based on your configuration
Tracks Progress: It monitors each Pod to see if it completes successfully
Handles Failures: If a Pod fails, the Job controller can restart it (depending on your configuration)
Completes: Once the required number of successful completions is reached, the Job is finished and all Pods are terminated

Basic Job Example

Here's a simple Job that runs a container and completes after one successful run:

apiVersion: batch/v1
kind: Job
metadata:
  name: hello-job
spec:
  completions: 1
  backoffLimit: 4
  template:
    spec:
      containers:
      - name: hello
        image: busybox
        command: ["sh", "-c", "echo Hello from Kubernetes Job && sleep 10"]
      restartPolicy: OnFailure

Key fields explained:

completions: 1 - The Job should complete exactly once
backoffLimit: 4 - If a Pod fails, retry it up to 4 times before giving up
restartPolicy: OnFailure - Restart the container if it fails, but don't restart the Pod if it exits due to node failure

Running Multiple Pods

You can configure a Job to run multiple Pods in parallel:

apiVersion: batch/v1
kind: Job
metadata:
  name: parallel-job
spec:
  parallelism: 3
  completions: 5
  template:
    spec:
      containers:
      - name: worker
        image: busybox
        command: ["sh", "-c", "echo Processing task $((RANDOM % 100)) && sleep 5"]
      restartPolicy: OnFailure

Key fields explained:

parallelism: 3 - Run up to 3 Pods simultaneously
completions: 5 - The Job should complete 5 times total (not necessarily by 3 Pods)

This creates a Job that runs up to 3 Pods at a time until 5 total completions are achieved.

CronJobs: Scheduled Jobs

A CronJob creates Jobs on a repeating schedule, similar to how the cron utility works on Linux systems. This is perfect for scheduled tasks like backups, report generation, or data processing.

CronJob Syntax

CronJob uses the standard cron syntax:

┌───────────── minute (0 - 59)
│ ┌───────────── hour (0 - 23)
│ │ ┌───────────── day of the month (1 - 31)
│ │ │ ┌───────────── month (1 - 12)
│ │ │ │ ┌───────────── day of the week (0 - 6) (Sunday to Saturday)
│ │ │ │ │
* * * * *

Basic CronJob Example

apiVersion: batch/v1
kind: CronJob
metadata:
  name: backup-cronjob
spec:
  schedule: "0 2 * * *"  # Run at 2 AM every day
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup
            image: postgres:15
            command: ["pg_dump", "mydb", "-f", "/backup/db-backup.sql"]
            volumeMounts:
            - name: backup-storage
              mountPath: /backup
          restartPolicy: OnFailure
          volumes:
          - name: backup-storage
            persistentVolumeClaim:
              claimName: backup-pvc

This CronJob runs a PostgreSQL backup every day at 2 AM.

CronJob Configuration Options

apiVersion: batch/v1
kind: CronJob
metadata:
  name: advanced-cronjob
spec:
  schedule: "*/5 * * * *"  # Every 5 minutes
  concurrencyPolicy: Allow  # Allow concurrent jobs
  startingDeadlineSeconds: 200  # Start job even if it's late
  successfulJobsHistoryLimit: 3  # Keep last 3 successful jobs
  failedJobsHistoryLimit: 1  # Keep last 1 failed job
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: worker
            image: busybox
            command: ["sh", "-c", "echo Running task && sleep 30"]
          restartPolicy: OnFailure

Key fields explained:

concurrencyPolicy: Allow - Allow new jobs to start while previous jobs are still running (default)
concurrencyPolicy: Forbid - Don't start new jobs if previous jobs are still running
concurrencyPolicy: Replace - Cancel the currently running job and start a new one
startingDeadlineSeconds: 200 - If a CronJob misses its scheduled time by more than this, don't start it
successfulJobsHistoryLimit: 3 - Keep only the last 3 successful job runs in history
failedJobsHistoryLimit: 1 - Keep only the last 1 failed job run in history

Practical Use Cases

1. Database Backups

apiVersion: batch/v1
kind: CronJob
metadata:
  name: database-backup
spec:
  schedule: "0 3 * * *"  # 3 AM daily
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup
            image: postgres:15
            command:
            - /bin/sh
            - -c
            - |
              pg_dump -U postgres mydatabase > /backup/backup-$(date +%Y%m%d).sql
              gzip /backup/backup-$(date +%Y%m%d).sql
            env:
            - name: PGPASSWORD
              valueFrom:
                secretKeyRef:
                  name: db-credentials
                  key: password
            volumeMounts:
            - name: backup-storage
              mountPath: /backup
          restartPolicy: OnFailure
          volumes:
          - name: backup-storage
            persistentVolumeClaim:
              claimName: backup-pvc

2. Data Processing Pipeline

apiVersion: batch/v1
kind: CronJob
metadata:
  name: data-processor
spec:
  schedule: "0 */6 * * *"  # Every 6 hours
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: processor
            image: myapp/data-processor:latest
            command: ["python", "process.py", "--input", "/data/input", "--output", "/data/output"]
            volumeMounts:
            - name: input-data
              mountPath: /data/input
            - name: output-data
              mountPath: /data/output
          restartPolicy: OnFailure
          volumes:
          - name: input-data
            persistentVolumeClaim:
              claimName: input-pvc
          - name: output-data
            persistentVolumeClaim:
              claimName: output-pvc

3. Cleanup Tasks

apiVersion: batch/v1
kind: CronJob
metadata:
  name: cleanup-logs
spec:
  schedule: "0 4 * * 0"  # Every Sunday at 4 AM
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: cleaner
            image: busybox
            command:
            - /bin/sh
            - -c
            - |
              find /var/log -name "*.log" -mtime +7 -delete
              find /tmp -type f -mtime +1 -delete
          restartPolicy: OnFailure

Job and CronJob Best Practices

1. Use Appropriate Restart Policies

OnFailure - Best for Jobs that can recover from temporary failures
Never - Use for Jobs that should never restart (e.g., one-time tasks)

2. Handle Job Completion

apiVersion: batch/v1
kind: Job
metadata:
  name: completion-demo
spec:
  completions: 3
  parallelism: 2
  template:
    spec:
      containers:
      - name: task
        image: busybox
        command: ["sh", "-c", "echo Task $((JOB_COMPLETION_INDEX)) complete"]
      restartPolicy: OnFailure

The $JOB_COMPLETION_INDEX environment variable is set for each Pod, allowing you to identify which Pod is running.

3. Monitor Job Status

# Check job status
kubectl get jobs
 
# View job details
kubectl describe job <job-name>
 
# View job logs
kubectl logs <job-name>-<pod-name>
 
# List completed jobs
kubectl get jobs --field-selector status.successful=1

4. Handle Failed Jobs

# Delete a failed job
kubectl delete job <job-name>
 
# Delete all jobs
kubectl delete jobs --all

Common Pitfalls

1. Infinite Loops in Jobs

Make sure your Job has a way to exit. If a container runs indefinitely, the Job will never complete.

# BAD - Infinite loop
command: ["sh", "-c", "while true; do echo Running; sleep 1; done"]
 
# GOOD - Finite task
command: ["sh", "-c", "echo Processing && sleep 5 && echo Done"]

2. Insufficient Resources

Jobs can consume significant resources. Make sure your cluster has enough capacity.

apiVersion: batch/v1
kind: Job
metadata:
  name: resource-intensive-job
spec:
  template:
    spec:
      containers:
      - name: worker
        image: busybox
        command: ["sh", "-c", "dd if=/dev/zero of=/dev/null bs=1M count=1000"]
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "1Gi"
            cpu: "1"
      restartPolicy: OnFailure

3. Missing Volume Mounts

If your Job needs to access persistent storage, make sure you configure volume mounts correctly.

Conclusion

Kubernetes Jobs and CronJobs provide powerful capabilities for running batch tasks and scheduled operations. By understanding their configuration options and best practices, you can build robust automation workflows that run reliably in your Kubernetes cluster.

For production workloads, consider using a managed service like ServerlessBase to handle the deployment and monitoring of your Jobs and CronJobs, ensuring they run consistently and efficiently.

Next Steps:

Explore Kubernetes Init Containers for setup tasks
Learn about Kubernetes Sidecar patterns for enhanced functionality
Understand Kubernetes Operators for complex automation scenarios