Kubernetes notes (11) – data persistence, YAML description PersistentVolume (Pod mount), PersistentVolumeClaim, StorageClass

When we were learning ConfigMap/Secret before, we encountered the concept of Volume storage volume in Kubernetes, which uses the field volumes and volumeMounts are equivalent to mounting a “virtual disk” for Pod, injecting configuration information into Pod in the form of files code> for use by processes.

However, Volume at that time could only store less data, which was far from a real “virtual disk”.

Now let’s understand the advanced usage of Volume, and look at the API object PersistentVolume of Kubernetes managing storage resources, PersistentVolumeClaim, StorageClass then use the local disk to create an actual usable storage volume.

1. PersistentVolume

We built a WordPress website in the Kubernetes cluster, but there is a very serious problem: Pod has no persistence function, resulting in MariaDB cannot store data “permanently”.

Because the container in the Pod is generated by the image, and the image file itself is read-only, the process can only use a temporary storage space to read and write the disk. Once the Pod Destroyed, the temporary storage will be reclaimed and released immediately, and the data will be lost.

In order to ensure that the reconstruction data still exists even after the Pod is destroyed, we need to find a solution to allow the Pod to use a real “virtual disk”. How to do it?

In fact, Volume of Kubernetes has given a good abstraction for data storage, it just defines such a “storage volume”, and this “storage volume” What type, how much capacity, how to store, we are free to play. Pod does not need to care about those professional and complicated details, as long as volumeMounts is set, Volume can be loaded into the container for use.

Therefore, Kubernetes follows the concept of Volume and extends the PersistentVolume object, which is specially used to represent persistent storage devices, but hides storage We only need to know that it can store data safely and reliably (because the word PersistentVolume is very long, it is generally abbreviated as PV).

So, where do the PV in the cluster come from?

As an abstraction of storage, PV is actually some storage devices and file systems, such as Ceph, GlusterFS, NFS , or even local disks, managing them is beyond the scope of Kubernetes, so, generally, the system administrator will maintain it separately, and then create the corresponding Kubernetes code>PV.

It should be noted that PV belongs to the system resources of the cluster and is an object at the same level as Node, and Pod has no management rights to it. Only the right to use.

2. PersistentVolumeClaim/StorageClass

Now that we have PV, can we mount and use it directly in Pod?

Not yet. Because the difference between different storage devices is too great: some are fast, some are slow; some can share read and write, some can only read and write exclusively; As large as TB, PB level…

With so many kinds of storage devices, it is a bit too reluctant to use only one PV object to manage them. It does not conform to the principle of “single responsibility”. Let Pod directly select PV is also very inflexible. So Kubernetes added two new objects, PersistentVolumeClaim and StorageClass, using the idea of “middle layer” to put storage volume The allocation management process is refined again.

PersistentVolumeClaim, PVC for short, is easy to understand from the name, it is used to apply for storage resources from Kubernetes. PVC is the object used by Pod, it is equivalent to the agent of Pod, representing Pod to apply to the system for PV. Once the resource application is successful, Kubernetes will associate PV with PVC, this action is called “bind”.

However, there are a lot of storage resources in the system. If you want to directly traverse PVC to find a suitable PV, it is also very troublesome, so you need to use StorageClass.

StorageClass is a bit like the previous IngressClass, which abstracts a specific type of storage system (such as Ceph, NFS), in PVC and PV acts as a “coordinator” to help PVC find a suitable PV. That is to say, it can simplify the process of Pod mounting “virtual disk”, so that Pod cannot see the implementation details of PV.

image.png
If you see this, you feel that you almost understand it, so don’t worry, let’s find an example in life to compare. After all, compared with commonly used CPU and memory, we still know less about storage systems, so in Kubernetes, PV, PVC and StorageClass These three new concepts are not particularly easy to grasp.

Looking at the example, suppose you want 10 sheets of printed materials in the company, so you call the front desk to clarify the requirements.

  • The action of “calling” is equivalent to PVC, applying for storage resources from Kubernetes.
  • There are various brands of office paper in the front desk, with different sizes and specifications, which is equivalent to StorageClass.
  • The front desk selects a brand according to your needs, and then takes out a pack of A4 paper from the inventory, which may be more than 10 sheets, but it can also meet the requirements, and then adds a new record on the registration form, writing that you Claimed office supplies. This process is the binding of PVC to PV.
  • And the A4 paper bag in your hand is the storage object of PV.

3. Use YAML to describe PersistentVolume

There are many types of PV in Kubernetes, let’s first look at the easiest local storage HostPath, which is the same as Docker The -v parameter to mount the local directory in code> is very similar, you can use it to get a preliminary understanding of the usage of PV.

Because Pod will run on any node of the cluster, first, we need to create a directory on each node as a system administrator, which will be mounted to Pod as a local storage volume inside.

To save trouble, I created a directory named host-10m-pv in /tmp, indicating a storage device with only 10MB capacity.

With storage, we can use YAML to describe this PV object.

But unfortunately, you can’t use kubectl create to directly create PV objects, you can only use kubectl api-resources, kubectl explain code> View the field description of PV, and manually write the YAML description file of PV.

Below I give a YAML example, you can use it as a template to edit your own PV:

# host-10m-pv.yml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: host-10m-pv

spec:
  storageClassName: host-test
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 10Mi
  hostPath:
    path: /tmp/host-10m-pv/

storageClassName is what I just said, the abstract StorageClass of the storage type. This PV is manually managed by us, and the name can be arbitrary. Here I wrote host-test, and you can also change it to manual, hand-work and the like.

accessModes defines the access mode of the storage device. Simply put, it is the read and write permission of the virtual disk, which is similar to the file access mode of Linux. Currently, Kubernetes There are 3 types:

  • ReadWriteOnce: The storage volume can be read and written, but it can only be mounted by Pod on one node.
  • ReadOnlyMany: The storage volume is read-only but not writable, and can be mounted multiple times by Pod on any node.
  • ReadWriteMany: The storage volume can be read and written, and can also be mounted multiple times by Pod on any node.

You should note that these 3 access modes are restricted to nodes rather than Pod, because storage is a system-level concept and does not belong to the process in Pod.

Obviously, the local directory can only be used locally, so this PV uses ReadWriteOnce.

The third field capacity is easy to understand, indicating the capacity of the storage device, here I set it to 10MB.

Remind you again that the definition of storage capacity in Kubernetes uses international standards, and the base of KB/MB/GB that we are used to every day is 1024, so it should be written as Ki/Mi/Gi, you must be careful not to write it wrong, otherwise the unit will not match the actual capacity.

The last field hostPath is the simplest, it specifies the local path of the storage volume, which is the directory we created on the node. Use these fields to describe the type, access mode, capacity, and storage location of PV clearly, and a storage device is created.

4. Use YAML to describe PersistentVolumeClaim

With PV, it means that there is such a persistent storage in the cluster that can be used by Pod, we need to define the PVC object again, to PVC code>Kubernetes requests storage.

The following YAML is a PVC, which requires a 5MB storage device, and the access mode is ReadWriteOnce:

# host-5m-pvc.yml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: host-5m-pvc

spec:
  storageClassName: host-test
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Mi

The content of PVC is very similar to PV, but it does not represent actual storage, but an “application” or “declaration”, in spec The field describes the “desired state” of the storage.

So storageClassName, accessModes and PV in PVC are the same, but there will be no field capacity , but use resources.request to indicate how much capacity you want.

In this way, Kubernetes will find a PV that can match StorageClass and capacity according to the description in PVC, and then “Bind” PV and PVC together to realize storage allocation, which is similar to the previous process of calling for A4 paper.

5. Using PersistentVolume in Kubernetes

After preparing PV and PVC, you can make Pod implement persistent storage.

First you need to use kubectl apply to create a PV object:

kubectl apply -f host-10m-pv.yml

Then use kubectl get to check its status:

host-pv

From the screenshot, we can see that the capacity of this PV is 10MB, the access mode is RWO (ReadWriteOnce), StorageClass code> is our own defined host-test, and the status shows Available, that is, it is in an available state and can be assigned to Pod at any time .

Next, we create PVC and apply for storage resources:

kubectl apply -f host-5m-pvc.yml
kubectl get pvc

pvc

Once the PVC object is successfully created, Kubernetes will immediately search for the requirements in the cluster through StorageClass, resources and other conditions PV, if a suitable storage object is found, the two will be “bound” together.

PVC object is 5MB, but now there is only one 10MB PV in the system, there is no more suitable object, so Kubernetes can only put This PV is allocated, and the extra capacity is considered a “welfare”.

You will see that the status of these two objects is Bound, which means that the storage application is successful, and the actual capacity of PVC is the capacity of PV 10MB instead of the initially requested 5MB.

So, what if we increase the application capacity of PVC? For example, change to 100MB:

Let’s delete the PVC first

kubectl delete -f host-5m-pvc.yml

View pv status

delete pvcReference:
Solution to K8S PV always in Released state
k8s pv has been in release state

Let’s delete the previously bound content first

kubectl edit pv host-10m-pv


Check the status of pv again, it has returned to the normal usable status

pv status

Modify the application capacity of PVC to 100Mi, after kubctl apply, check the status of PV and PVC again.

pending

You will see that the PVC will always be in the Pending state, which means that Kubernetes cannot find the required storage in the system and cannot allocate resources. Binding can only be completed when there is a PV that meets the requirements.

6. Mount PersistentVolume for Pod

With persistent storage in place, we can now mount volumes for Pod . First define the storage volume in spec.volumes , and then mount it into the container in containers.volumeMounts .

But because we are using PVC, we need to use the field persistentVolumeClaim in volumes to specify the name of PVC.
The following is the YAML description file of Pod, which mounts the storage volume to the /tmp directory of the Nginx container:

# host-path-pod.yml
apiVersion: v1
kind: Pod
metadata:
  name: host-pvc-pod

spec:
  volumes:
  - name: host-pvc-vol
    persistentVolumeClaim:
      claimName: host-5m-pvc

  containers:
    - name: ngx-pvc-pod
      image: nginx:alpine
      ports:
      - containerPort: 80
      volumeMounts:
      - name: host-pvc-vol
        mountPath: /tmp

I drew the relationship between Pod and PVC/PV as a graph (the field accessModes is omitted), and you can see how they are connected from the graph:
image.png
Now we create this Pod and check its status:

kubectl apply -f host-path-pod.yml
kubectl get pod -o wide

get pod

It was transferred to the worker node by Kubernetes, so is PV mounted successfully? Let’s enter the container with kubectl exec and execute some commands to see:

host-pvc

A host-pvc.txt file is generated in the /tmp directory of the container. According to the definition of PV, it should fall in the worker node disk, so we log into the worker node to check:

host-pvc-pode

You will see that there is indeed a host-pvc.txt file in the local directory of the worker node, and then check the time to confirm that it was generated in the Pod just now document.

Because the data generated by Pod has been stored on the disk through PV, if Pod is deleted and then recreated, the storage volume will still be mounted Using this directory, the data remains unchanged, and persistent storage is achieved.

But there is still a small problem, because this PV is of HostPath type, and it is only stored on this node. If Pod is rebuilt, it is scheduled to other nodes , then even if the local directory is loaded, it will not be the previous storage location, and the persistence function will be invalid.

Therefore, the PV of the HostPath type is generally used for testing, or for applications such as DaemonSet that are closely related to nodes, as we will see in the next section The lesson will talk about achieving truly arbitrary data persistence.

7. Summary

  • PersistentVolume, referred to as PV for short, is the abstraction of storage devices by Kubernetes and is maintained by the system administrator. It is necessary to clearly describe the type and access mode of storage devices , capacity and other information.
  • PersistentVolumeClaim, referred to as PVC, represents Pod to apply for storage resources from the system. It declares the storage requirements, and Kubernetes will Find the most suitable PV and bind.
  • StorageClass abstracts a specific type of storage system, classifies and groups PV objects, and simplifies the binding process of PV/PVC.
  • HostPath is the simplest PV, and the data is stored locally on the node, which is fast but cannot migrate with Pod.
  • pvc is an application, the real use is volume, and then pv is mounted into volume >Pod .

Kubernetes has a special form of storage volume called emptyDir, which has the same life cycle as Pod, longer than containers, but not persistent storage. Can be used as staging or caching.

If the storage system conforms to the CSI standard, the ReadWriteOncePod attribute can also be used in accessModes to allow only a single Pod to read and write , the granularity of control is finer.

1. StorageClass is host-test defined by us. Doesn’t it need us to manually create this StorageClass named host-test? If not, what kind of existence does it have and what role does it play
Answer: For purely manually created PVs, no special StorageClass object is needed, and special storage devices such as NFS will be used later.