ServerlessBase Blog
  • Introduction to Kubernetes Jobs and CronJobs

    Learn how to run batch jobs and scheduled tasks in Kubernetes with Jobs and CronJobs

    Introduction to Kubernetes Jobs and CronJobs

    You've probably deployed a web application that runs continuously, but Kubernetes also has powerful features for running one-time tasks and scheduled jobs. When you need to process a batch of data, run a nightly backup, or execute a recurring task, Kubernetes Jobs and CronJobs are exactly what you need.

    What Are Jobs?

    A Kubernetes Job is a controller that creates one or more Pods and ensures that a specified number of them successfully complete. Once the specified number of successful completions is reached, the Job is finished.

    Think of a Job as a task that runs to completion. Unlike a Deployment that keeps your application running indefinitely, a Job runs once (or a specified number of times) and then stops.

    Job Lifecycle

    When you create a Job, Kubernetes does the following:

    1. Creates Pods: The Job controller creates the specified number of Pods based on your configuration
    2. Tracks Progress: It monitors each Pod to see if it completes successfully
    3. Handles Failures: If a Pod fails, the Job controller can restart it (depending on your configuration)
    4. Completes: Once the required number of successful completions is reached, the Job is finished and all Pods are terminated

    Basic Job Example

    Here's a simple Job that runs a container and completes after one successful run:

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: hello-job
    spec:
      completions: 1
      backoffLimit: 4
      template:
        spec:
          containers:
          - name: hello
            image: busybox
            command: ["sh", "-c", "echo Hello from Kubernetes Job && sleep 10"]
          restartPolicy: OnFailure

    Key fields explained:

    • completions: 1 - The Job should complete exactly once
    • backoffLimit: 4 - If a Pod fails, retry it up to 4 times before giving up
    • restartPolicy: OnFailure - Restart the container if it fails, but don't restart the Pod if it exits due to node failure

    Running Multiple Pods

    You can configure a Job to run multiple Pods in parallel:

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: parallel-job
    spec:
      parallelism: 3
      completions: 5
      template:
        spec:
          containers:
          - name: worker
            image: busybox
            command: ["sh", "-c", "echo Processing task $((RANDOM % 100)) && sleep 5"]
          restartPolicy: OnFailure

    Key fields explained:

    • parallelism: 3 - Run up to 3 Pods simultaneously
    • completions: 5 - The Job should complete 5 times total (not necessarily by 3 Pods)

    This creates a Job that runs up to 3 Pods at a time until 5 total completions are achieved.

    CronJobs: Scheduled Jobs

    A CronJob creates Jobs on a repeating schedule, similar to how the cron utility works on Linux systems. This is perfect for scheduled tasks like backups, report generation, or data processing.

    CronJob Syntax

    CronJob uses the standard cron syntax:

    ┌───────────── minute (0 - 59)
    │ ┌───────────── hour (0 - 23)
    │ │ ┌───────────── day of the month (1 - 31)
    │ │ │ ┌───────────── month (1 - 12)
    │ │ │ │ ┌───────────── day of the week (0 - 6) (Sunday to Saturday)
    │ │ │ │ │
    * * * * *

    Basic CronJob Example

    apiVersion: batch/v1
    kind: CronJob
    metadata:
      name: backup-cronjob
    spec:
      schedule: "0 2 * * *"  # Run at 2 AM every day
      jobTemplate:
        spec:
          template:
            spec:
              containers:
              - name: backup
                image: postgres:15
                command: ["pg_dump", "mydb", "-f", "/backup/db-backup.sql"]
                volumeMounts:
                - name: backup-storage
                  mountPath: /backup
              restartPolicy: OnFailure
              volumes:
              - name: backup-storage
                persistentVolumeClaim:
                  claimName: backup-pvc

    This CronJob runs a PostgreSQL backup every day at 2 AM.

    CronJob Configuration Options

    apiVersion: batch/v1
    kind: CronJob
    metadata:
      name: advanced-cronjob
    spec:
      schedule: "*/5 * * * *"  # Every 5 minutes
      concurrencyPolicy: Allow  # Allow concurrent jobs
      startingDeadlineSeconds: 200  # Start job even if it's late
      successfulJobsHistoryLimit: 3  # Keep last 3 successful jobs
      failedJobsHistoryLimit: 1  # Keep last 1 failed job
      jobTemplate:
        spec:
          template:
            spec:
              containers:
              - name: worker
                image: busybox
                command: ["sh", "-c", "echo Running task && sleep 30"]
              restartPolicy: OnFailure

    Key fields explained:

    • concurrencyPolicy: Allow - Allow new jobs to start while previous jobs are still running (default)
    • concurrencyPolicy: Forbid - Don't start new jobs if previous jobs are still running
    • concurrencyPolicy: Replace - Cancel the currently running job and start a new one
    • startingDeadlineSeconds: 200 - If a CronJob misses its scheduled time by more than this, don't start it
    • successfulJobsHistoryLimit: 3 - Keep only the last 3 successful job runs in history
    • failedJobsHistoryLimit: 1 - Keep only the last 1 failed job run in history

    Practical Use Cases

    1. Database Backups

    apiVersion: batch/v1
    kind: CronJob
    metadata:
      name: database-backup
    spec:
      schedule: "0 3 * * *"  # 3 AM daily
      jobTemplate:
        spec:
          template:
            spec:
              containers:
              - name: backup
                image: postgres:15
                command:
                - /bin/sh
                - -c
                - |
                  pg_dump -U postgres mydatabase > /backup/backup-$(date +%Y%m%d).sql
                  gzip /backup/backup-$(date +%Y%m%d).sql
                env:
                - name: PGPASSWORD
                  valueFrom:
                    secretKeyRef:
                      name: db-credentials
                      key: password
                volumeMounts:
                - name: backup-storage
                  mountPath: /backup
              restartPolicy: OnFailure
              volumes:
              - name: backup-storage
                persistentVolumeClaim:
                  claimName: backup-pvc

    2. Data Processing Pipeline

    apiVersion: batch/v1
    kind: CronJob
    metadata:
      name: data-processor
    spec:
      schedule: "0 */6 * * *"  # Every 6 hours
      jobTemplate:
        spec:
          template:
            spec:
              containers:
              - name: processor
                image: myapp/data-processor:latest
                command: ["python", "process.py", "--input", "/data/input", "--output", "/data/output"]
                volumeMounts:
                - name: input-data
                  mountPath: /data/input
                - name: output-data
                  mountPath: /data/output
              restartPolicy: OnFailure
              volumes:
              - name: input-data
                persistentVolumeClaim:
                  claimName: input-pvc
              - name: output-data
                persistentVolumeClaim:
                  claimName: output-pvc

    3. Cleanup Tasks

    apiVersion: batch/v1
    kind: CronJob
    metadata:
      name: cleanup-logs
    spec:
      schedule: "0 4 * * 0"  # Every Sunday at 4 AM
      jobTemplate:
        spec:
          template:
            spec:
              containers:
              - name: cleaner
                image: busybox
                command:
                - /bin/sh
                - -c
                - |
                  find /var/log -name "*.log" -mtime +7 -delete
                  find /tmp -type f -mtime +1 -delete
              restartPolicy: OnFailure

    Job and CronJob Best Practices

    1. Use Appropriate Restart Policies

    • OnFailure - Best for Jobs that can recover from temporary failures
    • Never - Use for Jobs that should never restart (e.g., one-time tasks)

    2. Handle Job Completion

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: completion-demo
    spec:
      completions: 3
      parallelism: 2
      template:
        spec:
          containers:
          - name: task
            image: busybox
            command: ["sh", "-c", "echo Task $((JOB_COMPLETION_INDEX)) complete"]
          restartPolicy: OnFailure

    The $JOB_COMPLETION_INDEX environment variable is set for each Pod, allowing you to identify which Pod is running.

    3. Monitor Job Status

    # Check job status
    kubectl get jobs
     
    # View job details
    kubectl describe job <job-name>
     
    # View job logs
    kubectl logs <job-name>-<pod-name>
     
    # List completed jobs
    kubectl get jobs --field-selector status.successful=1

    4. Handle Failed Jobs

    # Delete a failed job
    kubectl delete job <job-name>
     
    # Delete all jobs
    kubectl delete jobs --all

    Common Pitfalls

    1. Infinite Loops in Jobs

    Make sure your Job has a way to exit. If a container runs indefinitely, the Job will never complete.

    # BAD - Infinite loop
    command: ["sh", "-c", "while true; do echo Running; sleep 1; done"]
     
    # GOOD - Finite task
    command: ["sh", "-c", "echo Processing && sleep 5 && echo Done"]

    2. Insufficient Resources

    Jobs can consume significant resources. Make sure your cluster has enough capacity.

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: resource-intensive-job
    spec:
      template:
        spec:
          containers:
          - name: worker
            image: busybox
            command: ["sh", "-c", "dd if=/dev/zero of=/dev/null bs=1M count=1000"]
            resources:
              requests:
                memory: "512Mi"
                cpu: "500m"
              limits:
                memory: "1Gi"
                cpu: "1"
          restartPolicy: OnFailure

    3. Missing Volume Mounts

    If your Job needs to access persistent storage, make sure you configure volume mounts correctly.

    Conclusion

    Kubernetes Jobs and CronJobs provide powerful capabilities for running batch tasks and scheduled operations. By understanding their configuration options and best practices, you can build robust automation workflows that run reliably in your Kubernetes cluster.

    For production workloads, consider using a managed service like ServerlessBase to handle the deployment and monitoring of your Jobs and CronJobs, ensuring they run consistently and efficiently.


    Next Steps:

    • Explore Kubernetes Init Containers for setup tasks
    • Learn about Kubernetes Sidecar patterns for enhanced functionality
    • Understand Kubernetes Operators for complex automation scenarios

    Leave comment