Kubernetes Persistent Volumes and Storage Classes

You've deployed your first application to Kubernetes, and it works great. But then you restart the pod, and your data disappears. This happens because containers are ephemeral by design — when a pod dies, its filesystem is wiped clean. If you need data to survive pod restarts, scaling, or even node failures, you need persistent storage.

Persistent Volumes (PVs) and Persistent Volume Claims (PVCs) solve this problem by decoupling storage resources from the pods that consume them. This separation gives you flexibility, control, and the ability to manage storage at scale. Let's break down how this works and how to use it effectively.

Understanding the Storage Architecture

Kubernetes introduces three distinct concepts that work together to provide storage abstraction:

PersistentVolume (PV): A cluster-level resource representing actual storage in the cluster. It's provisioned by an administrator or dynamically provisioned. Think of it as a physical disk or network storage volume that exists independently of any pod.
PersistentVolumeClaim (PVC): A request for storage by a user or application. It's a namespace-scoped resource that requests specific storage characteristics like size, access mode, and storage class. The PVC is what pods actually use.
StorageClass: Defines the "type" of storage available. It provides a way to parameterize storage provisioning, allowing different classes like fast SSDs, standard HDDs, or cloud-specific storage solutions.

This three-part architecture separates concerns: administrators manage storage resources (PVs), while users request storage (PVCs) without needing to know the underlying implementation details.

Storage Access Modes

When you create a PersistentVolumeClaim, you specify how the storage should be accessed. Kubernetes defines several access modes that map to real-world storage capabilities:

Access Mode	Description	Use Case
ReadWriteOnce (RWO)	Can be mounted as read-write by a single node	Databases, stateful applications
ReadOnlyMany (ROX)	Can be mounted read-only by many nodes	Shared configuration files, static assets
ReadWriteMany (RWX)	Can be mounted read-write by many nodes	Shared file systems, content management systems
ReadWriteOncePod (RWOP)	Can be mounted read-write by a single pod	Stateful applications with unique data per pod

The access mode you choose depends on your application's requirements. Most stateful applications need ReadWriteOnce, while applications that need to share data across multiple pods or nodes require ReadWriteMany.

StorageClass Fundamentals

StorageClass is where the real power of Kubernetes storage comes in. It defines the "class" of storage, which can include:

Provisioner: The storage backend (e.g., local, hostPath, AWS EBS, GCE PD, Azure Disk)
ReclaimPolicy: What happens to the volume after the PVC is deleted (Retain, Delete, or Recycle)
VolumeBindingMode: When to bind the PV to the PVC (Immediate or WaitForFirstConsumer)
MountOptions: Additional options passed to the storage system

Here's a typical StorageClass definition:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp3
  iops: "3000"
  throughput: "125"
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

This StorageClass uses AWS EBS gp3 volumes with high IOPS and throughput. The WaitForFirstConsumer binding mode delays volume provisioning until a pod is scheduled, ensuring the volume is created in the correct availability zone.

Creating a PersistentVolumeClaim

A PersistentVolumeClaim is straightforward. You define the storage you need, and Kubernetes handles the rest. Here's a basic example:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mysql-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: fast-ssd

This claim requests 10Gi of ReadWriteOnce storage using the fast-ssd StorageClass. When you create this claim, Kubernetes either finds an existing PV that matches or provisions a new one dynamically.

Step-by-Step: Deploying a Stateful Application with Persistent Storage

Let's walk through deploying a MySQL database with persistent storage. This is a common real-world scenario where data persistence is critical.

Step 1: Create the StorageClass

First, ensure you have a StorageClass configured for your cloud provider or local setup. For AWS, this might already exist. For local Kubernetes (like Minikube or Kind), you might need to create one:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer

Step 2: Create the PersistentVolumeClaim

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mysql-data
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi
  storageClassName: local-storage

Step 3: Create the MySQL Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mysql
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: mysql:8.0
        env:
        - name: MYSQL_ROOT_PASSWORD
          value: "password"
        - name: MYSQL_DATABASE
          value: "mydb"
        ports:
        - containerPort: 3306
        volumeMounts:
        - name: mysql-storage
          mountPath: /var/lib/mysql
      volumes:
      - name: mysql-storage
        persistentVolumeClaim:
          claimName: mysql-data

Step 4: Verify the Setup

kubectl get pvc
kubectl get pv
kubectl get deployment mysql

You should see the PersistentVolumeClaim in Bound status, indicating it has been provisioned and bound to a PersistentVolume. The MySQL pod will now have persistent storage attached.

Step 5: Test Persistence

Create a database and table:

kubectl exec -it mysql-xxxxxxxxxx-xxxxx -- mysql -uroot -ppassword -e "CREATE DATABASE testdb; USE testdb; CREATE TABLE users (id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(100)); INSERT INTO users (name) VALUES ('Alice');"

Restart the pod:

kubectl delete pod mysql-xxxxxxxxxx-xxxxx

Verify the data still exists:

kubectl exec -it mysql-xxxxxxxxxx-xxxxx -- mysql -uroot -ppassword -e "USE testdb; SELECT * FROM users;"

The data persists because the pod is using the same PersistentVolumeClaim, which maintains the storage across pod restarts.

Dynamic Provisioning vs Static Provisioning

Kubernetes supports two ways to provision storage:

Static Provisioning: An administrator manually creates PersistentVolumes with specific storage configurations. Users then claim these pre-existing volumes. This gives full control but requires manual setup.

Dynamic Provisioning: Storage is automatically provisioned when a PersistentVolumeClaim is created. The StorageClass defines the provisioner and parameters. This is the default and most common approach.

Dynamic provisioning is generally preferred because it's automated and scales better. However, static provisioning can be useful for specialized storage or when you need fine-grained control over volume creation.

Volume Expansion

Modern Kubernetes clusters support volume expansion, allowing you to increase the size of existing PersistentVolumes without recreating them. This is particularly useful for databases that need more storage over time.

To enable volume expansion, set allowVolumeExpansion: true in your StorageClass:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp3
allowVolumeExpansion: true

Then expand the PVC:

kubectl patch pvc mysql-data -p '{"spec":{"resources":{"requests":{"storage":"30Gi"}}}}'

The underlying storage is automatically resized, and the filesystem is expanded. Note that filesystem expansion must be done separately depending on your storage backend.

Best Practices for Production

1. Use Appropriate StorageClass

Match the storage class to your workload requirements. High-performance applications need fast SSDs, while batch jobs can use standard HDDs. Don't over-provision — match storage characteristics to application needs.

2. Set Appropriate Access Modes

Choose the right access mode based on your application's requirements. Most stateful applications work well with ReadWriteOnce. Only use ReadWriteMany if your application truly needs to share data across multiple nodes.

3. Implement Backup Strategies

Persistent storage is not a backup. Always implement backup strategies for your data. Use tools like Velero for Kubernetes backups, or configure your storage backend's backup capabilities.

4. Monitor Storage Usage

Keep track of storage usage across your cluster. Use tools like Prometheus and Grafana to monitor PVC sizes and storage utilization. Set up alerts for approaching capacity limits.

5. Use StorageClass Reclaim Policies Wisely

The default reclaim policy is Delete, which automatically removes the underlying storage when the PVC is deleted. For critical data, consider Retain, which keeps the volume but marks it as released. You can then manually reclaim the storage.

6. Consider Volume Snapshots

Many cloud providers and storage backends support snapshots. Use snapshots for point-in-time backups and disaster recovery. Kubernetes supports snapshot APIs through tools like Velero or cloud-specific solutions.

Common Pitfalls

1. Assuming All Storage is the Same

Different storage classes have different performance characteristics, costs, and capabilities. Don't assume that all PersistentVolumes behave the same way. Always check the StorageClass documentation.

2. Ignoring Volume Binding Mode

The default volume binding mode is Immediate, which can lead to scheduling issues. Use WaitForFirstConsumer to ensure volumes are created in the correct availability zone and with appropriate node constraints.

3. Forgetting to Handle Volume Expansion

Storage needs change over time. Plan for volume expansion by enabling it in your StorageClass and testing the expansion process before you need it in production.

4. Over-Provisioning Storage

Provisioning more storage than you need increases costs and can complicate management. Monitor usage and provision only what you need.

5. Not Using Persistent Volumes for Stateful Applications

Stateful applications like databases, caches, and message queues require persistent storage. Using empty volumes or hostPath for production workloads is a recipe for data loss.

Conclusion

Persistent Volumes and Storage Classes provide Kubernetes with a powerful storage abstraction layer. By separating storage resources from pods, you gain flexibility, control, and the ability to scale storage independently of your applications.

The key takeaways are: understand your access mode requirements, choose the right StorageClass for your workload, implement proper backup and monitoring strategies, and plan for future storage needs like volume expansion. With these practices in place, you can build robust, production-ready applications that rely on persistent storage.

Platforms like ServerlessBase simplify the deployment of stateful applications with persistent storage, handling the complex storage configuration and management automatically so you can focus on building your applications.

Next Steps

Now that you understand Persistent Volumes and Storage Classes, consider exploring:

StatefulSets: For applications that require stable network identities and persistent storage
Volume Snapshots: For backup and disaster recovery strategies
Storage Monitoring: Using Prometheus and Grafana to track storage usage and performance
Backup Solutions: Implementing comprehensive backup strategies with Velero or cloud-native tools

Remember that storage is a critical component of any stateful application. Take the time to understand your requirements and implement appropriate storage solutions for your Kubernetes workloads.