ServerlessBase Blog
  • Optimizing Docker Image Size: Techniques and Tools

    A comprehensive guide to reducing Docker image size with practical techniques and tools for production deployments.

    Optimizing Docker Image Size: Techniques and Tools

    You've just pushed a Docker image to your registry, and it's 2GB. Your CI/CD pipeline takes 15 minutes to build it. Your deployment takes another 5 minutes to pull and extract. You're paying for unnecessary storage and bandwidth. This is a problem that affects every team building containerized applications.

    Docker image size optimization isn't just about saving disk space. Smaller images mean faster builds, faster deployments, lower storage costs, and better security profiles. Every megabyte you shave off translates to real operational improvements.

    Understanding Image Layers

    Docker images are built from layers, each representing a filesystem change. When you run docker build, each instruction in your Dockerfile creates a new layer. These layers are cached, which is why Docker builds are fast. But this caching behavior also means that every instruction adds to the final image size.

    # Example: A simple Dockerfile with multiple layers
    FROM node:18-alpine AS base
    WORKDIR /app
    COPY package*.json ./
    RUN npm ci --only=production
    COPY . .
    RUN npm run build

    The problem with this Dockerfile is that every COPY and RUN instruction creates a new layer. If you change a single file in your source code, Docker rebuilds all subsequent layers. This is efficient for development but can lead to unnecessarily large production images.

    Base Image Selection

    The single biggest factor in image size is your base image. A node:18 image weighs in at around 900MB, while node:18-alpine is only about 120MB. That's a 7x reduction with almost identical runtime behavior.

    # Large base image
    FROM node:18
    # Result: ~900MB image
     
    # Small base image
    FROM node:18-alpine
    # Result: ~120MB image

    Alpine Linux uses musl libc instead of glibc, which reduces size but can cause compatibility issues with some Node.js packages. If you encounter build failures with Alpine, try node:18-bullseye-slim instead.

    Multi-Stage Builds

    Multi-stage builds let you use different build stages for compilation and runtime, discarding build artifacts from the final image. This is one of the most effective techniques for reducing image size.

    # Stage 1: Build
    FROM node:18 AS builder
    WORKDIR /app
    COPY package*.json ./
    RUN npm ci
    COPY . .
    RUN npm run build
     
    # Stage 2: Production
    FROM node:18-alpine
    WORKDIR /app
    COPY --from=builder /app/dist ./dist
    COPY --from=builder /app/node_modules ./node_modules
    COPY --from=builder /app/package*.json ./
    CMD ["node", "dist/index.js"]

    The first stage builds your application, installing all dependencies including devDependencies. The second stage copies only what's needed for production: the compiled output, production dependencies, and package.json. The build stage is discarded, keeping the final image small.

    Removing Unnecessary Files

    You can explicitly remove files from your image using RUN rm -rf or .dockerignore to prevent them from being copied in the first place.

    # .dockerignore
    node_modules
    npm-debug.log
    .git
    .gitignore
    README.md
    *.md
    .env
    .env.*

    Common files to exclude:

    • Development dependencies
    • Documentation files
    • Git metadata
    • Test files
    • Environment files
    • Build artifacts (unless needed in final image)

    Using Build Arguments for Dependencies

    Instead of copying package.json and running npm install in every stage, use build arguments to pass dependency information.

    ARG NODE_ENV=production
    RUN if [ "$NODE_ENV" = "production" ]; then \
          npm ci --only=production; \
        else \
          npm install; \
        fi

    This ensures you only install production dependencies in your final image, reducing both size and attack surface.

    Leveraging BuildKit

    Docker BuildKit provides advanced caching and optimization features. Enable it in your Dockerfile with # syntax=docker/dockerfile:1.2 or set the environment variable.

    export DOCKER_BUILDKIT=1
    docker build -t myapp .

    BuildKit automatically optimizes layer caching and can detect when layers can be merged or skipped.

    Image Scanning and Analysis

    Before deploying, scan your images for vulnerabilities and analyze their size.

    # Scan for vulnerabilities
    trivy image myapp:latest
     
    # Analyze image size
    docker history myapp:latest
    docker images myapp:latest

    Tools like Trivy, Clair, and Snyk can identify security issues and help you understand what's taking up space in your images.

    Comparison of Optimization Techniques

    TechniqueSize ReductionComplexityBest For
    Alpine base images70-80%LowMost Node.js/Python apps
    Multi-stage builds50-70%MediumCompiled languages
    .dockerignore10-30%LowAll projects
    Build arguments5-15%LowProduction builds
    BuildKit5-10%LowAll projects

    Practical Walkthrough: Optimizing a Node.js Application

    Let's optimize a real-world Node.js application step by step.

    Step 1: Analyze Current Image Size

    # Build the current image
    docker build -t myapp:original .
     
    # Check the size
    docker images myapp:original
    # Output: myapp  original  1.2GB  2 hours ago

    Step 2: Switch to Alpine Base

    # Optimized Dockerfile
    FROM node:18-alpine AS base
    WORKDIR /app
     
    # Copy package files first for better caching
    COPY package*.json ./
    RUN npm ci --only=production
     
    # Copy source code
    COPY . .
     
    # Build the application
    RUN npm run build
    # Rebuild and check size
    docker build -t myapp:optimized .
    docker images myapp:optimized
    # Output: myapp  optimized  350MB  1 hour ago

    You've reduced the image from 1.2GB to 350MB—a 71% reduction.

    Step 3: Implement Multi-Stage Build

    # Build stage
    FROM node:18 AS builder
    WORKDIR /app
    COPY package*.json ./
    RUN npm ci
    COPY . .
    RUN npm run build
     
    # Production stage
    FROM node:18-alpine
    WORKDIR /app
    COPY --from=builder /app/dist ./dist
    COPY --from=builder /app/node_modules ./node_modules
    COPY --from=builder /app/package*.json ./
    CMD ["node", "dist/index.js"]
    # Rebuild and check size
    docker build -t myapp:multistage .
    docker images myapp:multistage
    # Output: myapp  multistage  180MB  30 minutes ago

    Now you're down to 180MB, a 85% reduction from the original.

    Step 4: Add .dockerignore

    Create a .dockerignore file:

    node_modules
    npm-debug.log
    .git
    .gitignore
    README.md
    *.md
    .env
    .env.*
    test
    tests
    coverage
    # Rebuild and check size
    docker build -t myapp:final .
    docker images myapp:final
    # Output: myapp  final  175MB  20 minutes ago

    The final image is 175MB, with minimal additional reduction but improved build times.

    Common Pitfalls

    1. Over-optimizing

    Don't sacrifice security or compatibility for size. Some packages require glibc and won't work with Alpine. If you encounter build failures, switch to a slim Debian or Ubuntu base image.

    2. Forgetting to Update Dependencies

    When you update dependencies, rebuild your images. Old dependencies can include unnecessary files or security vulnerabilities.

    3. Ignoring Layer Caching

    Order your Dockerfile instructions to maximize cache hits. Copy package files before source code, and put frequently changing files at the end.

    4. Including Development Tools

    Never include npm install -g tools or development scripts in your production image. Use multi-stage builds to separate build and runtime environments.

    Monitoring and Maintenance

    Image size optimization is an ongoing process. Set up monitoring to track image sizes and alert when they grow beyond acceptable thresholds.

    # Script to monitor image sizes
    #!/bin/bash
    docker images --format "{{.Repository}}:{{.Tag}} {{.Size}}" | sort -k2 -h

    Regularly review your images and apply new optimization techniques as they become available.

    Conclusion

    Docker image size optimization is a critical practice for production deployments. By using Alpine base images, multi-stage builds, .dockerignore, and build arguments, you can reduce image sizes by 70-85% without sacrificing functionality or security.

    The techniques discussed here—base image selection, multi-stage builds, file exclusion, and build optimization—provide a solid foundation for keeping your images lean. Remember that optimization is an ongoing process: regularly review your images, stay updated with new tools and techniques, and maintain a culture of size-conscious development.

    Platforms like ServerlessBase can help manage your container deployments and monitor image sizes across your infrastructure, making it easier to maintain optimal image sizes at scale.

    Leave comment