Kubernetes Persistent Volumes and Storage Classes
You've deployed your first application to Kubernetes, and it works great. But then you restart the pod, and your data disappears. This happens because containers are ephemeral by design — when a pod dies, its filesystem is wiped clean. If you need data to survive pod restarts, scaling, or even node failures, you need persistent storage.
Persistent Volumes (PVs) and Persistent Volume Claims (PVCs) solve this problem by decoupling storage resources from the pods that consume them. This separation gives you flexibility, control, and the ability to manage storage at scale. Let's break down how this works and how to use it effectively.
Understanding the Storage Architecture
Kubernetes introduces three distinct concepts that work together to provide storage abstraction:
-
PersistentVolume (PV): A cluster-level resource representing actual storage in the cluster. It's provisioned by an administrator or dynamically provisioned. Think of it as a physical disk or network storage volume that exists independently of any pod.
-
PersistentVolumeClaim (PVC): A request for storage by a user or application. It's a namespace-scoped resource that requests specific storage characteristics like size, access mode, and storage class. The PVC is what pods actually use.
-
StorageClass: Defines the "type" of storage available. It provides a way to parameterize storage provisioning, allowing different classes like fast SSDs, standard HDDs, or cloud-specific storage solutions.
This three-part architecture separates concerns: administrators manage storage resources (PVs), while users request storage (PVCs) without needing to know the underlying implementation details.
Storage Access Modes
When you create a PersistentVolumeClaim, you specify how the storage should be accessed. Kubernetes defines several access modes that map to real-world storage capabilities:
| Access Mode | Description | Use Case |
|---|---|---|
| ReadWriteOnce (RWO) | Can be mounted as read-write by a single node | Databases, stateful applications |
| ReadOnlyMany (ROX) | Can be mounted read-only by many nodes | Shared configuration files, static assets |
| ReadWriteMany (RWX) | Can be mounted read-write by many nodes | Shared file systems, content management systems |
| ReadWriteOncePod (RWOP) | Can be mounted read-write by a single pod | Stateful applications with unique data per pod |
The access mode you choose depends on your application's requirements. Most stateful applications need ReadWriteOnce, while applications that need to share data across multiple pods or nodes require ReadWriteMany.
StorageClass Fundamentals
StorageClass is where the real power of Kubernetes storage comes in. It defines the "class" of storage, which can include:
- Provisioner: The storage backend (e.g., local, hostPath, AWS EBS, GCE PD, Azure Disk)
- ReclaimPolicy: What happens to the volume after the PVC is deleted (Retain, Delete, or Recycle)
- VolumeBindingMode: When to bind the PV to the PVC (Immediate or WaitForFirstConsumer)
- MountOptions: Additional options passed to the storage system
Here's a typical StorageClass definition:
This StorageClass uses AWS EBS gp3 volumes with high IOPS and throughput. The WaitForFirstConsumer binding mode delays volume provisioning until a pod is scheduled, ensuring the volume is created in the correct availability zone.
Creating a PersistentVolumeClaim
A PersistentVolumeClaim is straightforward. You define the storage you need, and Kubernetes handles the rest. Here's a basic example:
This claim requests 10Gi of ReadWriteOnce storage using the fast-ssd StorageClass. When you create this claim, Kubernetes either finds an existing PV that matches or provisions a new one dynamically.
Step-by-Step: Deploying a Stateful Application with Persistent Storage
Let's walk through deploying a MySQL database with persistent storage. This is a common real-world scenario where data persistence is critical.
Step 1: Create the StorageClass
First, ensure you have a StorageClass configured for your cloud provider or local setup. For AWS, this might already exist. For local Kubernetes (like Minikube or Kind), you might need to create one:
Step 2: Create the PersistentVolumeClaim
Step 3: Create the MySQL Deployment
Step 4: Verify the Setup
You should see the PersistentVolumeClaim in Bound status, indicating it has been provisioned and bound to a PersistentVolume. The MySQL pod will now have persistent storage attached.
Step 5: Test Persistence
Create a database and table:
Restart the pod:
Verify the data still exists:
The data persists because the pod is using the same PersistentVolumeClaim, which maintains the storage across pod restarts.
Dynamic Provisioning vs Static Provisioning
Kubernetes supports two ways to provision storage:
Static Provisioning: An administrator manually creates PersistentVolumes with specific storage configurations. Users then claim these pre-existing volumes. This gives full control but requires manual setup.
Dynamic Provisioning: Storage is automatically provisioned when a PersistentVolumeClaim is created. The StorageClass defines the provisioner and parameters. This is the default and most common approach.
Dynamic provisioning is generally preferred because it's automated and scales better. However, static provisioning can be useful for specialized storage or when you need fine-grained control over volume creation.
Volume Expansion
Modern Kubernetes clusters support volume expansion, allowing you to increase the size of existing PersistentVolumes without recreating them. This is particularly useful for databases that need more storage over time.
To enable volume expansion, set allowVolumeExpansion: true in your StorageClass:
Then expand the PVC:
The underlying storage is automatically resized, and the filesystem is expanded. Note that filesystem expansion must be done separately depending on your storage backend.
Best Practices for Production
1. Use Appropriate StorageClass
Match the storage class to your workload requirements. High-performance applications need fast SSDs, while batch jobs can use standard HDDs. Don't over-provision — match storage characteristics to application needs.
2. Set Appropriate Access Modes
Choose the right access mode based on your application's requirements. Most stateful applications work well with ReadWriteOnce. Only use ReadWriteMany if your application truly needs to share data across multiple nodes.
3. Implement Backup Strategies
Persistent storage is not a backup. Always implement backup strategies for your data. Use tools like Velero for Kubernetes backups, or configure your storage backend's backup capabilities.
4. Monitor Storage Usage
Keep track of storage usage across your cluster. Use tools like Prometheus and Grafana to monitor PVC sizes and storage utilization. Set up alerts for approaching capacity limits.
5. Use StorageClass Reclaim Policies Wisely
The default reclaim policy is Delete, which automatically removes the underlying storage when the PVC is deleted. For critical data, consider Retain, which keeps the volume but marks it as released. You can then manually reclaim the storage.
6. Consider Volume Snapshots
Many cloud providers and storage backends support snapshots. Use snapshots for point-in-time backups and disaster recovery. Kubernetes supports snapshot APIs through tools like Velero or cloud-specific solutions.
Common Pitfalls
1. Assuming All Storage is the Same
Different storage classes have different performance characteristics, costs, and capabilities. Don't assume that all PersistentVolumes behave the same way. Always check the StorageClass documentation.
2. Ignoring Volume Binding Mode
The default volume binding mode is Immediate, which can lead to scheduling issues. Use WaitForFirstConsumer to ensure volumes are created in the correct availability zone and with appropriate node constraints.
3. Forgetting to Handle Volume Expansion
Storage needs change over time. Plan for volume expansion by enabling it in your StorageClass and testing the expansion process before you need it in production.
4. Over-Provisioning Storage
Provisioning more storage than you need increases costs and can complicate management. Monitor usage and provision only what you need.
5. Not Using Persistent Volumes for Stateful Applications
Stateful applications like databases, caches, and message queues require persistent storage. Using empty volumes or hostPath for production workloads is a recipe for data loss.
Conclusion
Persistent Volumes and Storage Classes provide Kubernetes with a powerful storage abstraction layer. By separating storage resources from pods, you gain flexibility, control, and the ability to scale storage independently of your applications.
The key takeaways are: understand your access mode requirements, choose the right StorageClass for your workload, implement proper backup and monitoring strategies, and plan for future storage needs like volume expansion. With these practices in place, you can build robust, production-ready applications that rely on persistent storage.
Platforms like ServerlessBase simplify the deployment of stateful applications with persistent storage, handling the complex storage configuration and management automatically so you can focus on building your applications.
Next Steps
Now that you understand Persistent Volumes and Storage Classes, consider exploring:
- StatefulSets: For applications that require stable network identities and persistent storage
- Volume Snapshots: For backup and disaster recovery strategies
- Storage Monitoring: Using Prometheus and Grafana to track storage usage and performance
- Backup Solutions: Implementing comprehensive backup strategies with Velero or cloud-native tools
Remember that storage is a critical component of any stateful application. Take the time to understand your requirements and implement appropriate storage solutions for your Kubernetes workloads.