Easy Integration Series 2: How to create backup and restore in KubeBlocks? Take Oracle MySQL as an example

This article takes Oracle MySQL as an example to introduce how to create backup and restore in KubeBlocks (click to refer to the complete PR).

According to different classification methods, we can divide backups into multiple types. In terms of method, it can be divided into volume snapshot backup and file backup; in terms of content, it can be divided into data backup and log backup; in terms of volume, it can be divided into full backup and incremental backup; in terms of time, it can be divided into scheduled backup and on-demand backup and more.

This article will introduce how to implement the most common full snapshot backup and file backup on KubeBlocks.

Prerequisite

Understand the basic concepts of K8s, such as Pod, PVC, PV, VolumeSnapshot, etc.
Complete Tutorial 1
Learn about common backup-related concepts in KubeBlocks.

Table 1. Terminology

Name	Description	Scope
Backup	Backup object: the entity of the backup object.	Namespace
BackupPolicy	Backup policy: BackupPolicy defines related policies for various backup types, such as scheduling and backup retention time , Which backup tool to use.	Namespace
BackupTool	Backup Tool: BackupTool is the carrier of backup tools in KB, and each BackupTool should implement corresponding backup The backup logic and recovery logic of the tool.	Cluster
BackupPolicyTemplate	Backup policy template: BackupPolicyTemplate is a bridge between backup and ClusterDefinition. When creating a Cluster, KubeBlocks will automatically generate a default backup policy for each Cluster object based on the BackupPolicyTemplate.	Cluster

Table 1. shows common backup-related concepts in KubeBlocks, and then we will illustrate their functions and usage through examples.

Configuration environment

First, let’s clarify two premises:

Snapshot backup relies on the volume snapshot capability of Kubernetes.
File backup relies on the backup tools of each database engine.

1. Install CSI Driver

Because volume snapshots only support the CSI Driver, make sure your Kubernetes is configured correctly.
If you are in a local environment, you can quickly install csi-host-driver through the KubeBlocks Addon function:

kbcli addon enable csi-hostpath-driver

If it is a cloud environment, you need to configure the corresponding CSI Driver according to each cloud environment.

2. Set the storeclass as the default value to facilitate subsequent cluster creation

 kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
csi-hostpath-sc (default) hostpath.csi.k8s.io Delete WaitForFirstConsumer true 35s

Specify volume type

Specify volume type in ClusterDefinition [required].

 componentDefs:
    - name: mysql-compdef
      characterType: mysql
      workloadType: Stateful
      service:
        ports:
          - name: mysql
            port: 3306
            targetPort: mysql
      volumeTypes:
        - name: data
          type: data

volumeTypes specifies the volume type and volume name.
Volume types (volumeTypes.type) are divided into two types:

Data: data information
Log: Log information

KubeBlocks supports different backup methods for data and logs. Here, we only configure the data volume information.

Add backup configuration

We need to prepare two files BackupPolicyTemplate and BackupTool.

BackupPolicy Template

This is a template for a backup policy, mainly describing:

Which component of the Cluster to back up?
Whether to back up regularly
How to set up snapshot backup
How to set up file backup

apiVersion: apps.kubeblocks.io/v1alpha1
kind: BackupPolicyTemplate
metadata:
  name: oracle-mysql-backup-policy-template
  labels:
    clusterdefinition.kubeblocks.io/name: oracle-mysql # Specify the scope through label, which must be filled in
spec:
  clusterDefinitionRef: oracle-mysql #Specify the scope and which ClusterDef generated the cluster
  backupPolicies:
  - componentDefRef: mysql-compdef #Specify the scope and which component it is related to
    schedule: # schedule is used to specify scheduled backup time and startup status
      snapshot:
        enable: true # Start scheduled snapshot backup
        cronExpression: "0 18 * * *"
      datafile: # Disable scheduled file backup
        enable: false
        cronExpression: "0 18 * * *"
    snapshot: # Snapshot backup, the latest 5 versions are retained by default
      backupsHistoryLimit: 5
    datafile: # Data file backup, relying on backup tools
      backupToolName: oracle-mysql-xtrabackup

If scheduled tasks are enabled, KubeBlocks will create a CronJob in the background.

When a new cluster is created, the corresponding template name will be found through the clusterdefinition.kubeblocks.io/name tag and the corresponding BackupPolicy will be created.

If you successfully added the BackupPolicyTemplate, but the newly created Cluster does not have a default BackupPolicy, please check:

Is ClusterDefinionRef correct?

Is the label of BackupPolicyTempte correct?

Whether there are multiple associated BackupPolicyTemplates
If so, you need to mark one of them as the default template through annotation
 dataprotection.kubeblocks.io/is-default-policy-template: "true" ```

BackupTool

Describes the specific execution logic of the backup tool, which mainly serves file backup (datafile), including:

backup tool image
backup script
restore script

apiVersion: dataprotection.kubeblocks.io/v1alpha1
kind: BackupTool
metadata:
  name: oracle-mysql-xtrabackup
  labels:
spec:
  image: docker.io/perconalab/percona-xtrabackup:8.0.32 #Backup via xtrabackup
  env: #Environment variable name for injected dependencies
    - name: DATA_DIR
      value: /var/lib/mysql
  physical:
    restoreCommands: # restore command
      - sh
      - -c
      ...
  backupCommands: # backup command
    - sh
    - -c
    ...

The configuration of BackupTool is strongly related to the backup tool.

For example, if we use the Percona Xtrabackup tool for backup here, we need to fill in the scripts in backupCommand and restoreCommands.

BackupTool is mainly used for file backup services. If you only need snapshot backup and no file backup, there is no need to configure BackupTool.

Backup/Restore Cluster

Everything is ready, let’s try how to back up and restore a cluster.

1. Create a cluster

helm install mysql ./path-to-your-helm-chart/oracle-mysql
kbcli cluster create mycluster --cluster-definition oracle-mysql

Because we added BackupPolicyTemplate, after the cluster is created, KubeBlocks will find that BackupPolicy was created for the cluster. You can view it with the following command:

kbcli cluster list-backup-policy mycluster

2. Snapshot backup

kbcli cluster backup mycluster --type snapshot

type specifies the backup type, whether Snapshot or datafile.

If there are multiple backup policies, they can be specified via the --policy flag.

3. File backup

KubeBlocks supports backup to local and cloud object storage. Here is the process of backing up to local.

(1) Modify backuppolicy and specify pvc name

As shown below in line 37, you need to specify the name of the backup pvc.

 32 spec:
 33 datafile:
 34 backupToolName: oracle-mysql-xtrabackup
 35 backupsHistoryLimit: 7
 36 persistentVolumeClaim:
 37 name: mycluster-backup-pvc
 38 createPolicy: IfNotPresent
 39 initCapacity: 20Gi

(2) Execute the backup command and set –type to datafile

kbcli cluster backup mycluster --type datafile

4. Create a cluster from backup

(1) Check the backup first

kbcli cluster list-backups

(2) Select a backup and create a cluster from the backup

kbcli cluster restore <clusterName> --backup <backup-name>

Soon a new cluster is created.

It should be noted that some databases only create the root account and password when they are initialized for the first time.

Therefore, for the database cluster we restored through backup, although a new root account and password were created during the process, it did not take effect. You still need to log in through the root account and password of the original cluster.

Summary

This article demonstrates the configuration of backup strategies in KubeBlocks through a short example.

I hope it can help you have a basic understanding of the backup & restore function of KubeBlocks.

Appendix

A.1 Cluster data protection strategy

KubeBlocks provides data protection strategies for stateful clusters, and different strategies provide different data methods.

You can try the following scenarios:

If we delete the cluster via kbcli cluster delete, are our backups still there?
If you change the terminationPolicy of the cluster to WipeOut and then delete it, will the backup still be there?
What will happen if you change the cluster’s terminationPolicy to DoNotTerminate and then delete it?

Tip: Please refer to the data protection behavior of KubeBlocks

A.2 View backup information

In Section 4, we create a backup via the backup subcommand.

kbcli cluster backup mycluster --type snapshot

We will see that a new backup object is generated and can view more information through the describe-backup subcommand:

kbcli cluster describe-backup <your-back-up-name>

Reference materials

KubeBlocks MySQL backup documentation