How Docker Works: Understanding Container Architecture
You've probably run docker run hello-world a dozen times and seen the "Hello from Docker!" message. But have you ever wondered what actually happens under the hood? When you execute that command, Docker doesn't just magically create a lightweight virtual machine. It's doing something more clever—and more fundamental.
Docker containers share the host operating system kernel but run in isolated user spaces with their own filesystems, network stacks, and process trees. This isolation comes from two Linux kernel features: namespaces and control groups (cgroups). Understanding these mechanisms is the key to grasping how containers work and why they're so different from traditional virtual machines.
Namespaces: Isolating the View
Namespaces are the first layer of container isolation. They provide a view of the system that's limited to what the container needs. Think of namespaces as different windows into the same underlying system, where each window shows only what the container is allowed to see.
Mount Namespace
The mount namespace controls what filesystems are visible to a process. When you run a container, its mount namespace starts with an empty view. Docker then mounts the container's filesystem image (the root filesystem) into this namespace. The container sees only its own filesystem, not the host's.
Network Namespace
Network namespaces give each container its own network stack. This means containers can have their own IP addresses, network interfaces, and routing tables. When you run a container, Docker creates a virtual network interface (veth pair) and connects it to a bridge network.
The output shows the container's private IP address, completely separate from the host's network stack. Containers can communicate with each other on the same network but are isolated from the host's network by default.
PID Namespace
The PID namespace provides a unique view of processes. Each container gets its own process tree, starting with PID 1. This means a container can have a process with PID 1, while the host system has its own processes with different PIDs.
The host system sees the container process as a single process with a different PID. This isolation prevents containers from seeing or interfering with each other's processes.
User and UTS Namespaces
User namespaces map container user IDs to host user IDs, allowing containers to run as non-root users even when the host is running as root. The UTS namespace separates the hostname and domain name, so each container can have its own hostname.
Control Groups: Limiting Resources
While namespaces provide isolation, control groups (cgroups) limit and account for resource usage. Cgroups ensure that one container doesn't starve the others or consume all available resources on the host.
CPU Limits
Cgroups can limit the amount of CPU time a process can use. This prevents a misbehaving container from monopolizing the host's CPU cores.
When you set this limit, the container's processes are throttled if they try to use more CPU time than allowed. The kernel's scheduler ensures fair distribution among all containers.
Memory Limits
Memory limits prevent containers from consuming excessive memory, which could cause the host to run out of RAM and trigger the OOM killer.
If a container exceeds its memory limit, Docker sends a SIGTERM signal to its processes. If they don't shut down gracefully within a timeout period, Docker sends SIGKILL to forcefully terminate them.
Block I/O Limits
Cgroups can also limit the amount of disk I/O a container can perform. This is useful for preventing a single container from overwhelming the host's storage subsystem.
The Container Lifecycle
Understanding the container lifecycle helps you debug issues and optimize your deployments. Containers go through several states during their lifetime.
Created State
When you run docker create, the container is created but not started. The container image is downloaded, and the container filesystem is prepared, but no processes are running.
The -a flag shows all containers, including those in the Created state.
Running State
When you run docker start, the container's main process (defined in the Dockerfile's CMD or ENTRYPOINT) begins executing. The container is now actively using resources and can process requests.
The container appears in the output of docker ps with a status of Up.
Paused State
You can pause a running container with docker pause. This stops all processes in the container but keeps the container running. The container remains in the Paused state until you resume it.
The container status changes to Paused.
Restarting State
If a container exits and the --restart policy is set, Docker will attempt to restart it. The container goes through a Restarting state during this process.
Exited State
When a container stops normally (either by running docker stop or the main process exiting), it enters the Exited state. You can inspect the exit code and logs.
Dead State
If a container is forcefully killed (e.g., by the OOM killer or docker kill), it enters the Dead state. This typically happens when the container cannot be stopped gracefully.
Container vs Virtual Machine
The key difference between containers and virtual machines lies in how they use the host's resources.
| Feature | Container | Virtual Machine |
|---|---|---|
| Kernel | Shares host kernel | Has its own kernel |
| Size | ~100MB | ~1-10GB |
| Startup Time | Milliseconds | Seconds to minutes |
| Resource Overhead | Minimal | Significant |
| Isolation | Process-level | Hardware-level |
Containers share the host kernel, which means they're much lighter and faster to start. However, this also means containers can only run applications compatible with the host kernel. Virtual machines have their own kernel, providing complete isolation but at the cost of higher resource usage and slower startup times.
Practical Considerations
Security Implications
Because containers share the host kernel, a vulnerability in the kernel could potentially affect all containers running on that host. This is why running containers as non-root users and keeping the host kernel updated is critical.
Resource Contention
When multiple containers compete for resources, cgroups ensure fair distribution. However, you should still monitor resource usage to prevent one container from degrading the performance of others.
Storage Performance
Containers use copy-on-write filesystems, which means they don't copy the entire image when started. Instead, they reference the layers of the image and only write changes to a writable layer. This makes containers efficient but can impact storage performance if not managed properly.
Conclusion
Docker's container architecture relies on namespaces for isolation and cgroups for resource management. This combination allows containers to be lightweight, fast, and efficient while providing strong isolation guarantees. Understanding these mechanisms helps you debug issues, optimize resource usage, and design more reliable containerized applications.
If you're managing multiple containers and complex deployments, platforms like ServerlessBase can help you automate container orchestration, manage networking, and handle SSL certificates, so you can focus on building great applications rather than wrestling with infrastructure.
Next Steps:
- Explore Docker images and layers in the next article
- Learn how to write your first Dockerfile
- Understand Docker networking and storage options