Understanding Container Namespaces and Cgroups
You've probably heard that containers are lightweight because they share the host kernel. But have you actually thought about how that works? How does a single Linux kernel manage to run dozens of isolated processes without them stepping on each other's toes? The answer lies in two fundamental Linux kernel features: namespaces and control groups (cgroups).
What Are Namespaces?
Think of namespaces as a set of rules that define what a process can see. When you run a container, the kernel creates a new namespace for that process, and the process only sees the world through that namespace's lens.
The Six Types of Namespaces
Linux provides six different types of namespaces, each isolating a different aspect of the system:
| Namespace | What It Isolates | Why It Matters |
|---|---|---|
| Mount namespace | File system mounts | Processes see only their own mounted filesystems |
| Network namespace | Network interfaces | Each container gets its own network stack |
| PID namespace | Process IDs | Processes in one container see different PIDs than the host |
| UTS namespace | Hostname and domain name | Containers can have their own names |
| IPC namespace | Inter-process communication | Containers use separate System V IPC and POSIX message queues |
| User namespace | User and group IDs | Processes can run with different user permissions |
Mount Namespaces in Action
Mount namespaces are probably the most intuitive. When you start a container, the kernel creates a new mount namespace and mounts the container's root filesystem at /. The process inside the container sees / as its root directory, even though the host system has its own filesystem structure.
The container's view is much cleaner because it doesn't see the host's boot, media, or other directories.
Network Namespaces and the Network Stack
Network namespaces give each container its own network stack. This means a container can have its own IP addresses, network interfaces, routing tables, and firewall rules. When you run ip addr inside a container, you'll see only the interfaces configured for that container, not the host's network interfaces.
The container has its own loopback interface and a virtual Ethernet interface (eth0) connected to the container network.
PID Namespaces and Process Isolation
PID namespaces are crucial for process isolation. When you run ps aux inside a container, you'll see only the processes running inside that container, not the host's processes. The first process in a container always has PID 1, even if that process is actually PID 42 on the host.
The container's shell has PID 1, and ps shows only the processes in that namespace.
User Namespaces and Permission Isolation
User namespaces allow processes to run with different user and group IDs than the host. This is important for security because you can run a container as an unprivileged user (UID 65534) even though the host user is root (UID 0). The kernel maps the container's root user to a non-root user on the host.
The container thinks it's running as root, but on the host, it's actually running as user 1000. This provides a significant security improvement.
What Are Cgroups?
While namespaces provide isolation, they don't control resource usage. A process in its own namespace could still consume all available CPU, memory, or disk I/O. That's where cgroups come in.
The Purpose of Cgroups
Cgroups (control groups) limit, account for, and isolate the resource usage (CPU, memory, disk I/O, network bandwidth) of a collection of processes. Think of cgroups as a resource manager that ensures fair distribution and prevents one process from monopolizing system resources.
Controlling CPU Usage
Cgroups can limit CPU usage using the cpu subsystem. You can set a maximum CPU quota and period to control how much CPU time a process can use.
The process will now be limited to 50% CPU usage. If it tries to use more, the kernel will throttle it.
Controlling Memory Usage
Memory cgroups prevent processes from consuming excessive memory. You can set a memory limit, and the kernel will kill the process if it exceeds the limit.
If the process tries to allocate more than 512MB, the kernel will trigger OOM (out of memory) and kill the process.
Controlling Disk I/O
Cgroups can also limit disk I/O, which is useful for preventing a single process from saturating the disk and affecting other processes.
The process will now be limited to 1KB of I/O per second.
How Namespaces and Cgroups Work Together
Namespaces and cgroups work together to provide container isolation. Namespaces provide the illusion of isolation, while cgroups provide actual resource control.
The Container Startup Process
When you start a container, the following happens:
- The container runtime creates a new PID namespace for the container process
- The container runtime creates a new mount namespace and mounts the container's root filesystem
- The container runtime creates a new network namespace and configures virtual network interfaces
- The container runtime creates a new IPC namespace
- The container runtime creates a new UTS namespace
- The container runtime creates a new user namespace (if configured)
- The container runtime creates cgroups to limit CPU, memory, and I/O usage
- The container process is moved into the new namespaces and cgroups
The Container Runtime's Role
Container runtimes like Docker and Podman are responsible for creating and managing namespaces and cgroups. When you run docker run, the runtime:
- Creates a new network namespace and configures a virtual bridge network
- Creates a new mount namespace and mounts the container image's filesystem
- Creates cgroups to limit resource usage
- Sets up the container's environment variables and command
- Starts the container process
Common Use Cases
Isolating Development Environments
Developers often use containers to isolate their development environments. Each developer can have their own container with their own dependencies, without affecting other developers or the host system.
Resource Management in Multi-Tenant Environments
In multi-tenant environments, cgroups ensure that one tenant's applications don't consume all available resources. You can create different cgroups for different tenants and set resource limits accordingly.
Testing Resource Limits
Developers can use containers to test how their applications behave under resource constraints. By setting CPU and memory limits, you can simulate production conditions and identify potential issues.
Security Isolation
Namespaces provide a layer of security by isolating processes. Even if a vulnerability is found in a container, it can't access the host's resources or other containers' resources.
Limitations and Challenges
Shared Kernel
Containers share the host kernel, which means kernel vulnerabilities can affect all containers. This is why it's important to keep the host kernel updated and use security features like SELinux and AppArmor.
Namespace Limitations
Not all system resources are namespaced. For example, the kernel's internal structures and global system calls are not namespaced. This means containers can still access some host resources.
Cgroup Limitations
Cgroups can't prevent all resource exhaustion. For example, if a process forks infinitely, it can consume all available memory and CPU, even with cgroups. You need to use additional mechanisms like ulimits and OOM kill to prevent this.
Performance Overhead
Namespaces and cgroups add some overhead to process creation and resource management. However, this overhead is minimal compared to the benefits of isolation and resource control.
Conclusion
Namespaces and cgroups are the foundation of container isolation. Namespaces provide the illusion of isolation by hiding system resources from processes, while cgroups provide actual resource control by limiting CPU, memory, and I/O usage.
Together, they enable containers to be lightweight, secure, and efficient. Understanding how namespaces and cgroups work is essential for working with containers effectively.
If you're managing deployments at scale, platforms like ServerlessBase can help you automate container management and ensure consistent resource allocation across your infrastructure.