TiDB Containerized Management Tool–TiDB Operator

Author: lqbyz Original source: https://tidb.net/blog/c4a9caaf

Introduction

TiDB Operator is an automatic operation and maintenance system for TiDB clusters on Kubernetes, providing full lifecycle management of TiDB including deployment, upgrade, expansion and contraction, backup and recovery, and configuration changes. With TiDB Operator, TiDB can run seamlessly on public cloud or privately deployed Kubernetes clusters, which is currently open source pingcap/tidb-operator .


The corresponding relationship between TiDB Operator and the applicable TiDB version is as follows:

TiDB version applicable TiDB Operator version
dev dev
TiDB >= 5.4 1.3
5.1 <= TiDB < 5.4 1.3 (recommended), 1.2
3.0 <= TiDB < 5.1 1.3 (recommended), 1.2, 1.1
2.1 <= TiDB < v3.0 1.0 (end of maintenance)

Why do we need TiDB Operator

First, TiDB Operator provides full lifecycle management of automated deployment, upgrade, expansion and contraction, backup and recovery, and configuration changes .

TiDB’s layered architecture is relatively common for distributed systems. Each component can be scaled independently according to business needs, and both TiKV and TiDB can be used independently. For example, a KV database compatible with the Redis protocol can be built on top of TiKV, and TiDB can also be connected to a KV storage engine such as LevelDB. However, this multi-component distributed system increases the cost of manual deployment and operation and maintenance. Some traditional automated deployment and operation and maintenance tools, such as Puppet/Chef/SaltStack/Ansible, lack global state management, cannot perform automatic failover in time for various abnormal situations, and it is difficult to exert the elastic scalability of distributed systems. Some of them also need to write a lot of DSL or even mixed with shell scripts, the portability is poor, and the maintenance cost is relatively high.

Second, in the cloud era, containers have become the basic unit of application distribution and deployment, and Kubernetes has become the standard for current container orchestration technology.

Nowadays, major cloud vendors have begun to provide managed Kubernetes clusters. Applications deployed on the Kubernetes platform can be easily migrated between various cloud platforms without being bound to a specific cloud platform. The containerized packaging and publishing methods have also solved the problem. Dependency on operating system environment.

In order to deploy and manage a stateful service like TiDB on Kubernetes, we need to extend the functionality of StatefulSet. TiDB Operator is a TiDB cluster management and O&M tool developed natively based on Kubernetes’ built-in StatefulSet cloud. The official local PV solution did not support the scheduling function relatively stably until the latest Kubernetes v1.10 (need to run on Kubernetes v1.10 and above), to meet users’ needs for local PV. In order to reduce the user’s use and management costs and embrace the Kubernetes open source community.

Third, TiDB-related knowledge is required to perform operations.

When deploying a tidb cluster through TiDB Operator, you need to understand the structure of the tidb cluster, the relationship between each service component and the sequence of startup, and you need to operate according to the corresponding components when scaling.

Key Highlights

Easily deploy TiDB cluster

It can safely expand each component of the TiDB cluster, and TiDB Operator injects TiDB’s professional operation and maintenance knowledge into Kubernetes through custom resource objects (Custom Resource), custom controller (Custom controller) and scheduler extension (Scheduler extender) , allowing users to manage TiDB clusters in Kubernetes’ declarative API style. Specifically, users only need to describe the cluster specifications, and TiDB Operator will continuously adjust the resources in Kubernetes to drive the actual cluster to meet the description. In this mode, the TiDB cluster will automatically complete the health check and failover of the service, and operations such as deployment, upgrade, and capacity expansion can be completed with “one-click” modification of the cluster’s specification definition, which is very convenient. Greatly simplifies the operation and maintenance management of TiDB cluster

Scrolling updates.

In the case of no downtime, the rolling update is performed smoothly. Each service component will be updated one by one during the update process. When it is completed, other pods will be updated. If there is a problem, it is easy to roll back quickly.

Multi-tenancy support

Users can use TiDB Operator to deploy and manage multiple TiDB clusters on a single Kubernetes cluster.

Automatic failover

When a node/Pod fails, TiDB Operator automatically performs failover switching.

Simple monitoring

TiDB Operator supports the installation of Prometheus and Grafana for cluster monitoring

Multi-cloud support

1 provides Terraform deployment scripts for AWS, Google Cloud and Alibaba Cloud. These scripts can help you create a Kubernetes cluster within ten minutes, and deploy one or more production-available TiDB clusters on the cluster. In the subsequent management process, the Terraform script will operate the related cloud resources while operating the TiDB cluster. For example, when expanding a TiDB cluster, the Terraform script will automatically create more cloud servers to meet the resource requirements of the expanded cluster.

Schema

Among them, TidbCluster , TidbMonitor , TidbInitializer , Backup , Restore , BackupSchedule , TidbClusterAutoScaler are custom resources defined by CRD ( CustomResourceDefinition ):

  • TidbCluster is used to describe the TiDB cluster expected by the user
  • TidbMonitor is used to describe the TiDB cluster monitoring components expected by users
  • TidbInitializer is used to describe the TiDB cluster initialization job expected by the user
  • Backup is used to describe the user’s expected TiDB cluster backup
  • Restore is used to describe the TiDB cluster recovery expected by the user
  • BackupSchedule is used to describe the periodic backup of the TiDB cluster expected by the user
  • TidbClusterAutoScaler is used to describe the automatic scaling of the TiDB cluster expected by the user

The orchestration and scheduling logic of the TiDB cluster is in charge of the following components:

  • tidb-controller-manager is a set of custom controllers on Kubernetes. These controllers will constantly compare the desired state recorded in the TidbCluster object with the actual state of the TiDB cluster, and adjust the resources in Kubernetes to drive the TiDB cluster to meet the desired state, and complete the corresponding control logic based on other CRs ;
  • tidb-scheduler is a Kubernetes scheduler extension, which injects TiDB cluster-specific scheduling logic into the Kubernetes scheduler.
  • tidb-admission-webhook is a Kubernetes dynamic admission controller, which completes the modification, verification and operation and maintenance of Pod, StatefulSet and other related resources.
  • discovery is a service for discovery between components. Each TiDB cluster corresponds to a discovery Pod, which is used by components in the cluster to discover other created components.

TiDB Controller Manager

tidb-controller-manager continuously adjusts the difference between the desired state and the actual state. If the state does not match, the controller will trigger an action to transition to the desired state, as shown in the following figure:

image.png

TiDB Scheduler extension scheduler

TiDB Scheduler is a TiDB implementation of the Kubernetes scheduler extension. TiDB Scheduler is used to add new scheduling rules to Kubernetes. A kube-scheduler is deployed by default in the Kubernetes cluster for Pod scheduling. The default scheduler name is default-scheduler . For specific scheduling rules, please refer to K8S-POD Resources and Scheduling .

TiDB Scheduler adds custom scheduling rules by implementing the Kubernetes scheduler extension (Scheduler extender).

The TiDB Scheduler component is deployed as one or more Pods, but only one Pod is working at the same time. There are two Containers inside the Pod, one Container is the native kube-scheduler ; the other Container is tidb-scheduler , implemented as a Kubernetes scheduler extender.

If tidb-scheduler is used by the configuration component in TidbCluster , the .spec.schedulerName property of the PD, TiDB, and TiKV Pod created by TiDB Operator will be set tidb-scheduler , that is, use the TiDB Scheduler custom scheduler to schedule.

At this point, the scheduling process of a Pod is as follows:

  • kube-scheduler pulls all Pods whose .spec.schedulerName is tidb-scheduler , and each Pod will first be filtered by Kubernetes default scheduling rules ;
  • After that, kube-scheduler will send requests to tidb-scheduler service, and tidb-scheduler will pass some custom scheduling rules (see The above introduction) filter the sent nodes, and finally return the remaining schedulable nodes to kube-scheduler ;
  • In the end, kube-scheduler determines the final scheduled node.

TiDB Operator admission controller

Kubernetes introduced a dynamic admission mechanism in version 1.9, which enables the modification and verification of various resources in Kubernetes. In TiDB Operator, we also use a dynamic access mechanism to help us modify, verify, and maintain related resources.

The TiDB Operator admission controller is quite different from the admission controllers of most products on the Kubernetes platform. TiDB Operator is formed by extending the two mechanisms of API-Server and WebhookConfiguration. Therefore, the aggregation layer function needs to be enabled in the Kubernetes cluster, which is usually enabled by default. To check whether the aggregation layer function is enabled, please refer to Enable Kubernetes Apiserver flag.

Principle

The Operator is essentially the controller of Kubernetes. Its core idea is that the user gives a Spec description file. The Controller creates corresponding resources in the Kubernetes cluster according to the changes in the Spec, and continuously adjusts the resources to make their status meet the user’s expected Spec. .

The figure above is a schematic diagram of the workflow of TiDB Operator, where TidbCluster is a built-in resource type extended through CRD (Custom Resource Definition):

  1. Users create or update TidbCluster objects to the Kubernetes API Server through Helm.
  2. TiDB Operator maintains PD/TiKV/TiDB StatefulSet, Service and Deployment object updates by watching TidbCluster objects in the API Server to create, update or delete.
  3. Kubernetes creates, updates or deletes corresponding containers and services based on StatefulSet, Service and Deployment objects.

In the second step, when updating StatefulSet and other objects, TiDB Operator will refer to the cluster status given by PD API to perform operation and maintenance of TiDB cluster. Through the dynamic scheduling processing of TiDB Operator and Kubernetes, a TiDB cluster that meets user expectations is created.

Install and deploy TiDB Operator

Install helm

Helm is a Kubernetes package management tool. It is the best way to find, share and use software to build Kubernetes. Linux-like package managers, such as RedHat’s yum and Debian’s apt, can easily package the previously packaged yaml The files are deployed on Kubernetes. Helm mainly solves the following problems: 1. Manage yaml as a whole. 2. Realize the efficient reuse of yaml. 3. Implement application-level version management. Helm installation is relatively simple

Helm official download address (https://github.com/helm/helm/releases)
wget https://get.helm.sh/helm-v3.4.2-linux-amd64.tar.gz

Unzip Helm
tar zxvf helm-v3.4.2-linux-amd64.tar.gz

Move to the master node /usr/bin directory
mv linux-amd64/helm /usr/bin/

Verify that the installation was successful
helm version

Deploy TiDB Operator (offline mode)

Since most customers are on the intranet and cannot access the external network, this document deploys TiDB Operatro offline

Download tidb-operator

#Download the latest version of the package
wget http://charts.pingcap.org/tidb-operator-v1.4.3.tgz
# unzip
tar zxvf tidb-operator.v1.4.3.tgz

Download the required image, tag it and push it to the private warehouse

All images used by #tidb operator
pingcap/tidb-operator:v1.4.3
pingcap/tidb-backup-manager:v1.4.3
bitnami/kubectl:latest
pingcap/advanced-statefulset:v0.3.3
k8s.gcr.io/kube-scheduler:v1.16.9

##Image pull and download
docker pull pingcap/tidb-operator:v1.4.3
docker pull pingcap/tidb-backup-manager:v1.4.3
docker pull bitnami/kubectl:latest
docker pull pingcap/advanced-statefulset:v0.3.3

docker save -o tidb-operator-v1.4.3.tar pingcap/tidb-operator:v1.4.3
docker save -o tidb-backup-manager-v1.4.3.tar pingcap/tidb-backup-manager:v1.4.3
docker save -o bitnami-kubectl.tar bitnami/kubectl:latest
docker save -o advanced-statefulset-v0.3.3.tar pingcap/advanced-statefulset:v0.3.3

##Upload image

docker load -i tidb-operator-v1.4.3.tar
docker load -i tidb-backup-manager-v1.4.3.tar
docker load -i bitnami-kubectl.tar
docker load -i advanced-statefulset-v0.3.3.tar

##Remarks, if you push to a private warehouse, you need to tag the image and upload it
docker tag pingcap/tidb-operator:v1.4.3 xxx.com:5003/pingcap/tidb-operator:v1.4.3
docker push xxx.com:5003/pingcap/tidb-operator:v1.4.3

Configure TiDB Operator

The image name, limits , requests and replicas of the main configuration, please modify as needed

vim ./tidb-operator/values.yaml
clusterScoped: true

rbac:
  create: true

timezone: UTC

operatorImage: pingcap/tidb-operator:v1.4.3
imagePullPolicy: IfNotPresent

tidbBackupManagerImage: pingcap/tidb-backup-manager:v1.4.3
##Enable asts configuration
features:
-AdvancedStatefulSet=true
advancedStatefulset:
  create: true

appendReleaseSuffix: false

controllerManager:
  create: true
  serviceAccount: tidb-controller-manager

  clusterPermissions:
    nodes: true
    persistent volumes: true
    storageclasses: true

  logLevel: 2
  replicas: 1
  resources:
    requests:
      cpu: 80m
      memory: 50Mi



  autoFailover: true
  pdFailoverPeriod: 5m
  tikvFailoverPeriod: 5m
  tidbFailoverPeriod: 5m
  tiflashFailoverPeriod: 5m
  dmMasterFailoverPeriod: 5m
  dmWorkerFailoverPeriod: 5m
  detectNodeFailure: false
  affinity: {}
  nodeSelector: {}
  tolerations: []
  selector: []
  env: []
  securityContext: {}
  podAnnotations: {}

scheduler:
  create: true
  serviceAccount: tidb-scheduler
  logLevel: 2
  replicas: 1
  schedulerName: tidb-scheduler
  resources:
    limits:
      cpu: 250m
      memory: 150Mi
    requests:
      cpu: 80m
      memory: 50Mi
  kubeSchedulerImageName: k8s.gcr.io/kube-scheduler
  affinity: {}
  nodeSelector: {}
  tolerations: []
  securityContext: {}
  podAnnotations: {}

  configmapAnnotations: {}

advancedStatefulset:
  create: true
  image: pingcap/advanced-statefulset:v0.4.0
  imagePullPolicy: IfNotPresent
  serviceAccount: advanced-statefulset-controller
  logLevel: 4
  replicas: 1
  resources:
    limits:
      cpu: 500m
      memory: 300Mi
    requests:
      cpu: 200m
      memory: 50Mi
  affinity: {}
  nodeSelector: {}
  tolerations: []
  securityContext: {}

admissionWebhook:
  create: false
  replicas: 1
  serviceAccount: tidb-admission-webhook
  logLevel: 2
  rbac:
    create: true
  validation:
    statefulSets: false
    pingcapResources: false
  mutation:
    pingcapResources: true
  failurePolicy:
    validation: Fail
    mutation: Fail
  apiservice:
    insecureSkipTLSVerify: true
    tlsSecret: ""
    caBundle: ""
  cabundle: ""
  securityContext: {}
  nodeSelector: {}
  tolerations: []

Install TiDB Operator

kubectl create ns tidb-admin
helm install tidb-operator ./tidb-operator --namespace=tidb-admin

Upgrade TiDB Operator

helm upgrade tidb-operator ./tidb-operator --namespace=tidb-admin

Tidb cluster deployment

For details, refer to https://tidb.net/blog/9f9d32b3

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge Cloud native entry skill tree Container arrangement (production environment k8s)kubelet, kubectl, kubeadm three-piece set 11110 people is studying systematically