ServerlessBase Blog
  • Kubernetes Persistent Volumes and Storage Classes

    A comprehensive guide to understanding and implementing persistent storage in Kubernetes, including storage classes, claims, and best practices for production workloads.

    Kubernetes Persistent Volumes and Storage Classes

    You've deployed your first application to Kubernetes, and it works great. But then you restart the pod, and your data disappears. This happens because containers are ephemeral by design — when a pod dies, its filesystem is wiped clean. If you need data to survive pod restarts, scaling, or even node failures, you need persistent storage.

    Persistent Volumes (PVs) and Persistent Volume Claims (PVCs) solve this problem by decoupling storage resources from the pods that consume them. This separation gives you flexibility, control, and the ability to manage storage at scale. Let's break down how this works and how to use it effectively.

    Understanding the Storage Architecture

    Kubernetes introduces three distinct concepts that work together to provide storage abstraction:

    • PersistentVolume (PV): A cluster-level resource representing actual storage in the cluster. It's provisioned by an administrator or dynamically provisioned. Think of it as a physical disk or network storage volume that exists independently of any pod.

    • PersistentVolumeClaim (PVC): A request for storage by a user or application. It's a namespace-scoped resource that requests specific storage characteristics like size, access mode, and storage class. The PVC is what pods actually use.

    • StorageClass: Defines the "type" of storage available. It provides a way to parameterize storage provisioning, allowing different classes like fast SSDs, standard HDDs, or cloud-specific storage solutions.

    This three-part architecture separates concerns: administrators manage storage resources (PVs), while users request storage (PVCs) without needing to know the underlying implementation details.

    Storage Access Modes

    When you create a PersistentVolumeClaim, you specify how the storage should be accessed. Kubernetes defines several access modes that map to real-world storage capabilities:

    Access ModeDescriptionUse Case
    ReadWriteOnce (RWO)Can be mounted as read-write by a single nodeDatabases, stateful applications
    ReadOnlyMany (ROX)Can be mounted read-only by many nodesShared configuration files, static assets
    ReadWriteMany (RWX)Can be mounted read-write by many nodesShared file systems, content management systems
    ReadWriteOncePod (RWOP)Can be mounted read-write by a single podStateful applications with unique data per pod

    The access mode you choose depends on your application's requirements. Most stateful applications need ReadWriteOnce, while applications that need to share data across multiple pods or nodes require ReadWriteMany.

    StorageClass Fundamentals

    StorageClass is where the real power of Kubernetes storage comes in. It defines the "class" of storage, which can include:

    • Provisioner: The storage backend (e.g., local, hostPath, AWS EBS, GCE PD, Azure Disk)
    • ReclaimPolicy: What happens to the volume after the PVC is deleted (Retain, Delete, or Recycle)
    • VolumeBindingMode: When to bind the PV to the PVC (Immediate or WaitForFirstConsumer)
    • MountOptions: Additional options passed to the storage system

    Here's a typical StorageClass definition:

    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: fast-ssd
    provisioner: kubernetes.io/aws-ebs
    parameters:
      type: gp3
      iops: "3000"
      throughput: "125"
    reclaimPolicy: Delete
    volumeBindingMode: WaitForFirstConsumer
    allowVolumeExpansion: true

    This StorageClass uses AWS EBS gp3 volumes with high IOPS and throughput. The WaitForFirstConsumer binding mode delays volume provisioning until a pod is scheduled, ensuring the volume is created in the correct availability zone.

    Creating a PersistentVolumeClaim

    A PersistentVolumeClaim is straightforward. You define the storage you need, and Kubernetes handles the rest. Here's a basic example:

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: mysql-pvc
    spec:
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 10Gi
      storageClassName: fast-ssd

    This claim requests 10Gi of ReadWriteOnce storage using the fast-ssd StorageClass. When you create this claim, Kubernetes either finds an existing PV that matches or provisions a new one dynamically.

    Step-by-Step: Deploying a Stateful Application with Persistent Storage

    Let's walk through deploying a MySQL database with persistent storage. This is a common real-world scenario where data persistence is critical.

    Step 1: Create the StorageClass

    First, ensure you have a StorageClass configured for your cloud provider or local setup. For AWS, this might already exist. For local Kubernetes (like Minikube or Kind), you might need to create one:

    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: local-storage
    provisioner: kubernetes.io/no-provisioner
    volumeBindingMode: WaitForFirstConsumer

    Step 2: Create the PersistentVolumeClaim

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: mysql-data
    spec:
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 20Gi
      storageClassName: local-storage

    Step 3: Create the MySQL Deployment

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: mysql
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: mysql
      template:
        metadata:
          labels:
            app: mysql
        spec:
          containers:
          - name: mysql
            image: mysql:8.0
            env:
            - name: MYSQL_ROOT_PASSWORD
              value: "password"
            - name: MYSQL_DATABASE
              value: "mydb"
            ports:
            - containerPort: 3306
            volumeMounts:
            - name: mysql-storage
              mountPath: /var/lib/mysql
          volumes:
          - name: mysql-storage
            persistentVolumeClaim:
              claimName: mysql-data

    Step 4: Verify the Setup

    kubectl get pvc
    kubectl get pv
    kubectl get deployment mysql

    You should see the PersistentVolumeClaim in Bound status, indicating it has been provisioned and bound to a PersistentVolume. The MySQL pod will now have persistent storage attached.

    Step 5: Test Persistence

    Create a database and table:

    kubectl exec -it mysql-xxxxxxxxxx-xxxxx -- mysql -uroot -ppassword -e "CREATE DATABASE testdb; USE testdb; CREATE TABLE users (id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(100)); INSERT INTO users (name) VALUES ('Alice');"

    Restart the pod:

    kubectl delete pod mysql-xxxxxxxxxx-xxxxx

    Verify the data still exists:

    kubectl exec -it mysql-xxxxxxxxxx-xxxxx -- mysql -uroot -ppassword -e "USE testdb; SELECT * FROM users;"

    The data persists because the pod is using the same PersistentVolumeClaim, which maintains the storage across pod restarts.

    Dynamic Provisioning vs Static Provisioning

    Kubernetes supports two ways to provision storage:

    Static Provisioning: An administrator manually creates PersistentVolumes with specific storage configurations. Users then claim these pre-existing volumes. This gives full control but requires manual setup.

    Dynamic Provisioning: Storage is automatically provisioned when a PersistentVolumeClaim is created. The StorageClass defines the provisioner and parameters. This is the default and most common approach.

    Dynamic provisioning is generally preferred because it's automated and scales better. However, static provisioning can be useful for specialized storage or when you need fine-grained control over volume creation.

    Volume Expansion

    Modern Kubernetes clusters support volume expansion, allowing you to increase the size of existing PersistentVolumes without recreating them. This is particularly useful for databases that need more storage over time.

    To enable volume expansion, set allowVolumeExpansion: true in your StorageClass:

    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: fast-ssd
    provisioner: kubernetes.io/aws-ebs
    parameters:
      type: gp3
    allowVolumeExpansion: true

    Then expand the PVC:

    kubectl patch pvc mysql-data -p '{"spec":{"resources":{"requests":{"storage":"30Gi"}}}}'

    The underlying storage is automatically resized, and the filesystem is expanded. Note that filesystem expansion must be done separately depending on your storage backend.

    Best Practices for Production

    1. Use Appropriate StorageClass

    Match the storage class to your workload requirements. High-performance applications need fast SSDs, while batch jobs can use standard HDDs. Don't over-provision — match storage characteristics to application needs.

    2. Set Appropriate Access Modes

    Choose the right access mode based on your application's requirements. Most stateful applications work well with ReadWriteOnce. Only use ReadWriteMany if your application truly needs to share data across multiple nodes.

    3. Implement Backup Strategies

    Persistent storage is not a backup. Always implement backup strategies for your data. Use tools like Velero for Kubernetes backups, or configure your storage backend's backup capabilities.

    4. Monitor Storage Usage

    Keep track of storage usage across your cluster. Use tools like Prometheus and Grafana to monitor PVC sizes and storage utilization. Set up alerts for approaching capacity limits.

    5. Use StorageClass Reclaim Policies Wisely

    The default reclaim policy is Delete, which automatically removes the underlying storage when the PVC is deleted. For critical data, consider Retain, which keeps the volume but marks it as released. You can then manually reclaim the storage.

    6. Consider Volume Snapshots

    Many cloud providers and storage backends support snapshots. Use snapshots for point-in-time backups and disaster recovery. Kubernetes supports snapshot APIs through tools like Velero or cloud-specific solutions.

    Common Pitfalls

    1. Assuming All Storage is the Same

    Different storage classes have different performance characteristics, costs, and capabilities. Don't assume that all PersistentVolumes behave the same way. Always check the StorageClass documentation.

    2. Ignoring Volume Binding Mode

    The default volume binding mode is Immediate, which can lead to scheduling issues. Use WaitForFirstConsumer to ensure volumes are created in the correct availability zone and with appropriate node constraints.

    3. Forgetting to Handle Volume Expansion

    Storage needs change over time. Plan for volume expansion by enabling it in your StorageClass and testing the expansion process before you need it in production.

    4. Over-Provisioning Storage

    Provisioning more storage than you need increases costs and can complicate management. Monitor usage and provision only what you need.

    5. Not Using Persistent Volumes for Stateful Applications

    Stateful applications like databases, caches, and message queues require persistent storage. Using empty volumes or hostPath for production workloads is a recipe for data loss.

    Conclusion

    Persistent Volumes and Storage Classes provide Kubernetes with a powerful storage abstraction layer. By separating storage resources from pods, you gain flexibility, control, and the ability to scale storage independently of your applications.

    The key takeaways are: understand your access mode requirements, choose the right StorageClass for your workload, implement proper backup and monitoring strategies, and plan for future storage needs like volume expansion. With these practices in place, you can build robust, production-ready applications that rely on persistent storage.

    Platforms like ServerlessBase simplify the deployment of stateful applications with persistent storage, handling the complex storage configuration and management automatically so you can focus on building your applications.

    Next Steps

    Now that you understand Persistent Volumes and Storage Classes, consider exploring:

    • StatefulSets: For applications that require stable network identities and persistent storage
    • Volume Snapshots: For backup and disaster recovery strategies
    • Storage Monitoring: Using Prometheus and Grafana to track storage usage and performance
    • Backup Solutions: Implementing comprehensive backup strategies with Velero or cloud-native tools

    Remember that storage is a critical component of any stateful application. Take the time to understand your requirements and implement appropriate storage solutions for your Kubernetes workloads.

    Leave comment