Interpretation of OpenKruise V1.4: Added Job Sidecar Terminator capability

Author: Liheng

Foreword

OpenKruise is Alibaba Cloud’s open source cloud native application automation management suite, and it is also an incubating project currently hosted under the Cloud Native Computing Foundation (CNCF). It comes from Alibaba’s accumulation of containerization and cloud-native technology for many years. It is a standard extension component based on Kubernetes for large-scale application in Alibaba’s internal production environment. best practice.

OpenKruise:

https://github.com/openkruise/kruise

OpenKruise released the latest v1.4 version (ChangeLog [ 1] ) on 2023.3.31, adding the new Job Sidecar Terminator feature, This article will give an overall overview of the new version.

01 Important update

  • In order to facilitate everyone to use Kruise to enhance capabilities, some stable capabilities are enabled by default, as follows: ResourcesDeletionProtection, WorkloadSpread, PodUnavailableBudgetDeleteGate, InPlaceUpdateEnvFromMetadata, StatefulSetAutoDeletePVC, PodProbeMarkerGate. Most of the above capabilities require special configuration to take effect, so enabling them by default will generally not affect existing clusters. If you do not want to use some features, you can disable them during the upgrade.
  • The Kruise-Manager leader election method is migrated from configmaps to configmapsleases to prepare for the later migration to leases. In addition, this is an official smooth upgrade method that will not affect existing clusters.

02 Sidecar container management capability: Job Sidecar Terminator

For Job type workloads in Kubernetes, people usually want the Pod to enter the completed state when the main container completes the task and exits. However, when these Pods had Long-Running Sidecar Containers, the Pods kept failing to enter the completed state because the Sidecar Containers were unable to exit themselves after the main container exited.

Faced with this problem, common solutions in the community generally require modification of Main and Sidecar. The two share volume to achieve the effect of exiting the Sidecar container after the Main container exits.

The community’s solution can solve this problem, but the container needs to be transformed, especially for the sidecar container commonly used by the community, the cost of transformation and maintenance is too high.

To this end, we have added a controller named SidecarTerminator to Kruise, which is specially used to monitor the completion status of the main container in such scenarios, and choose an appropriate time to terminate the sidecar container in the Pod, and there is no need to modify the Main container. Intrusive retrofitting of sidecar containers.

Pods running on common nodes

For Pods (regular Kubelets) running on ordinary nodes, it is very simple to use this feature. Users only need to add a special env in the target sidecar container to identify it, and the controller will use the provided by Kruise Daemon at the right time. CRR’s ability to terminate these sidecar containers:

kind: Job
spec:
  template:
    spec:
      containers:
        - name: sidecar
          env:
            - name: KRUISE_TERMINATE_SIDECAR_WHEN_JOB_EXIT
              value: "true"
        - name: main
...

Pods running on virtual nodes

For some platforms that provide Serverless containers, such as **ECI [ 2] ** or **Fargate [ 3] **, its Pods can only run on virtual nodes like **Virtual-Kubelet [ 4] **. However, Kruise Daemon cannot be deployed and work on these virtual nodes, resulting in the inability to use the CRR capability to terminate containers. But fortunately, we can use the Pod in-place upgrade mechanism provided by native Kubernetes to achieve the same goal: we only need to construct a special image, and the only function of this image is to quickly and actively exit after being pulled up. Now, you only need to replace the original sidecar image with the quick exit image when exiting the sidecar, and then the purpose of exiting the sidecar can be achieved.

Step 1: Prepare a quick exit image

  • The image only needs to have a very simple logic: when it is pulled up, it exits directly with an exit code of 0.
  • The image needs to be compatible with the commands and args of the original sidecar image, in case the container is pulled up and reports an error.

Step 2: Configure your sidecar container

kind: Job
spec:
  template:
    spec:
      containers:
        - name: sidecar
          env:
            - name: KRUISE_TERMINATE_SIDECAR_WHEN_JOB_EXIT_WITH_IMAGE
              value: "example/quick-exit:v1.0.0"
        - name: main
...

Replace the above “example/quick-exit:v1.0.0” with your own quick-exit image.

Notes

  • The sidecar container must be able to respond to the SIGTERM signal, and when this signal is received, the entrypoint process needs to exit (that is, the sidecar container needs to exit), and the exit code should be 0.
  • This feature is applicable to Pods managed by any Job type Workload, as long as their RestartPolicy is Never/OnFailure.
  • The container with the environment variable KRUISE_TERMINATE_SIDECAR_WHEN_JOB_EXIT will be considered as a sidecar container, other containers will be considered as the main container, and the sidecar container will not be terminated until all main containers have completed:
    • Under the Never restart policy, once the main container exits, it will be considered “finished”.
    • Under the OnFailure restart policy, the main container exit code must be 0 to be considered “Completed”.

03 enhanced version workload

CloneSet optimization performance: Add FeatureGate CloneSetEventHandlerOptimization

Currently, the Pod Update event triggers the CloneSet reconcile logic whether it is a Pod state change or a Metadata change. CloneSet Reconcile configures three workers by default, which does not cause problems for small cluster scenarios.

However, in the case of a large cluster or many Pod Update events, these invalid reconciles will block the real CloneSet reconcile, which will delay changes such as CloneSet rolling upgrades. To solve this problem, you can turn on feature-gate CloneSetEventHandlerOptimization to reduce some unnecessary reconcile enqueues.

CloneSet adds disablePVCReuse field

If a Pod is deleted or evicted by an external direct call, the PVCs associated with the Pod still exist; and when the CloneSet controller finds that the number is insufficient and re-expands, the newly expanded Pod will reuse the instance-id of the original Pod and associate the original PVCs .

However, if the Node where the Pod is located is abnormal, the reuse may cause the new Pod to fail to start. For details, refer to **issue 1099 [ 5] **. To solve this problem, you can set the field disablePVCReuse=true, when the Pod is evicted or deleted, the PVCs related to the Pod will be automatically deleted and will no longer be reused.

apiVersion: apps.kruise.io/v1alpha1
kind: CloneSet
spec:
  ...
  replicas: 4
  scaleStrategy:
    disablePVCReuse: true

CloneSet adds PreNormal lifecycle hook

CloneSet already supports two lifecycle hooks, PreparingUpdate and PreparingDelete, which are used to gracefully log off the application. For details, please refer to **Community Documentation [ 6] **. In order to support graceful online scenarios, the PreNormal state is newly added this time, as follows:

apiVersion: apps.kruise.io/v1alpha1
kind: CloneSet
spec:
  #define with finalizer
  lifecycle:
    preNormal:
      finalizersHandler:
      - example.io/unready-blocker

  # or define with label
  lifecycle:
    preNormal:
      labelsHandler:
        example.io/block-unready: "true"

When CloneSet creates a Pod (including normal expansion and reconstruction upgrade):

  • If the Pod satisfies the definition of PreNormal hook, it will be considered as Available, and will enter the Normal state

This is useful for post-checks when some Pods are created. For example, you can check whether the Pod has been mounted to the SLB backend, so as to avoid traffic loss caused by the failure to mount the new instance after the old instance is destroyed during the rolling upgrade.

04 Advanced application operation and maintenance capabilities

New forceRecreate field for container restart

When creating a CRR resource, if the container is in the process of starting, CRR will no longer restart the container. If you want to force restart the container, you can enable it with the following fields:

apiVersion: apps.kruise.io/v1alpha1
kind: ContainerRecreateRequest
spec:
  ...
  strategy:
    forceRecreate: true

Image warm-up supports Attach metadata into cri interface

When Kubelet creates a Pod, Kubelet will attach metadata to the container runtime cri interface. The mirror warehouse can determine the source business of pulling the mirror based on the metadata information. If the warehouse is overloaded or under too much pressure, the specific business can be downgraded. OpenKruise image preheating also supports similar capabilities, as follows:

apiVersion: apps.kruise.io/v1alpha1
kind: ImagePullJob
spec:
  ...
  image: nginx:1.9.1
  sandboxConfig:
    annotations:
      io.kubernetes.image.metrics.tags: "cluster=cn-shanghai"
    labels:
      io.kubernetes.image.app: "foo"

Community Engagement

You are very welcome to join us to participate in the OpenKruise open source community through Github/Slack/DingTalk/WeChat. Do you already have some content that you would like to share with our community? Share your voice at our **Fortnightly Community Meeting [7 ] **, or join the discussion at:

  • Join the community **Slack channel [ 8] **(English)
  • Join the community DingTalk group: search group number 23330762 (Chinese)
  • Join the community WeChat group (new): add user openkruise and let the bot pull you into the group (Chinese)

Related links:

[1] ChangeLog

https://github.com/openkruise/kruise/blob/master/CHANGELOG.md

[2] ECI

https://www.aliyun.com/product/eci

[3] Fargate

https://aws.amazon.com/cn/fargate/

[4] Virtual-Kubelet

https://virtual-kubelet.io/#:~:text=Virtual Kubelet is an open, as serverless cloud container platforms.

[5] issue 1099

https://github.com/openkruise/kruise/issues/1099

[6] Community Documentation

https://openkruise.io/docs/user-manuals/cloneset/#lifecycle-hook

[7] Community bi-weekly meeting

https://browser.alibaba-inc.com/?Url=https://shimo.im/docs/gXqmeQOYBehZ4vqo

[8] Slack channel

https://kubernetes.slack.com/?redir=/archives/openkruise

Click here to view the official homepage and documentation of the OpenKruise project

syntaxbug.com © 2021 All Rights Reserved.