How to gracefully restart kubernetes Pods

I have been upgrading the service grid Istio recently. After the upgrade, a necessary process is to restart all the Pods on the data plane, that is, the business Pods, so that the sidecars of these Pods can be updated to the new version.

Option 1

Because we have a large number of Pods in different environments, it is impossible to manually restart them one by one; we have done similar operations before:

kubectl delete --all pods --namespace=dev

In this way, the Pods in the dev namespace can be deleted with one click. Kubernetes will automatically restart these Pods to ensure the availability of the application.

But a big problem is that the scheduling pressure on kubernetes is great. Generally, there are at least hundreds of Pods under one namespace, and all of them need to be rescheduled and started to The load on kubernetes will be very high, and a little carelessness will have serious consequences.

So my first version of the plan at that time was to traverse all deployments, delete a Pod and then sleep for 5 minutes before deleting the next one. The pseudo code is as follows:

deployments, err := clientSet.AppsV1().Deployments(ns).List(ctx, metav1.ListOptions{})
if err != nil {
    return err
}
for _, deployment := range deployments.Items {
 podList, err := clientSet.CoreV1().Pods(ns).List(ctx, metav1.ListOptions{
     LabelSelector: fmt.Sprintf("app=%s", deployment.Name),
 })
 err = clientSet.CoreV1().Pods(pod.Namespace).Delete(ctx, pod.Name, metav1.DeleteOptions{})
 if err != nil {
     return err
 }
 log.Printf(" Pod %s rebuild success.\\
", pod.Name)
 time.Sleep(time.Minute * 5)
}

Existing problems

This solution is indeed simple and crude, but problems were discovered during testing.

When some services have only one Pod, the business will be down after deleting it directly, and there will be no extra copies to provide services.

This is certainly unacceptable.

There are even cases where the restart is not successful after deletion:

If there is no restart for a long time, the image cache is gone, or the image has even been deleted. In this case, it is impossible to start successfully.
There are also some Pods that have Init-Container that will do some things during startup. If it fails, it will not start successfully. In short, there are many situations that cause a Pod to fail to start normally, which will directly cause production problems online, so the first option is definitely not available.

Option 2

For this reason, I have prepared plan two:

image.png

First increase the number of replicas by 1. This will add a new Pod and use the latest sidecar image.
Wait for the newly created Pod to restart successfully.
Delete the original Pod after successful restart.
Then restore the number of copies to the previous number.

In this way, the original Pod can be restarted smoothly. At the same time, if the new Pod fails to start, it will not continue to restart other Deployment Pods. The old Pod will also be retained without any impact on the service itself.

Existing problems

It seems that there is no problem, but it is troublesome to implement and the process is very cumbersome. Here I have posted part of the core code:

func RebuildDeploymentV2(ctx context.Context, clientSet kubernetes.Interface, ns string) error {
 deployments, err := clientSet.AppsV1().Deployments(ns).List(ctx, metav1.ListOptions{})
 if err != nil {
  return err
 }

 for _, deployment := range deployments.Items {

  //Print each Deployment
  log.Printf("Ready deployment: %s\\
", deployment.Name)

  originPodList, err := clientSet.CoreV1().Pods(ns).List(ctx, metav1.ListOptions{
   LabelSelector: fmt.Sprintf("app=%s", deployment.Name),
  })
  if err != nil {
   return err
  }

  // Check if there are any Pods
  if len(originPodList.Items) == 0 {
   log.Printf(" No pod in %s\\
", deployment.Name)
   continue
  }

  // Skip Pods that have already been upgraded
  updateSkip := false
  for _, container := range pod.Spec.Containers {
   if container.Name == "istio-proxy" & amp; & amp; container.Image == "proxyv2:1.x.x" {
    log.Printf(" Pod: %s Container: %s has already upgraded, skip\\
", pod.Name, container.Name)
    updateSkip = true
   }
  }
  if updateSkip {
   continue
  }

  // Scale the Deployment, create a new pod.
  scale, err := clientSet.AppsV1().Deployments(ns).GetScale(ctx, deployment.Name, metav1.GetOptions{})
  if err != nil {
   return err
  }
  scale.Spec.Replicas = scale.Spec.Replicas + 1
  _, err = clientSet.AppsV1().Deployments(ns).UpdateScale(ctx, deployment.Name, scale, metav1.UpdateOptions{})
  if err != nil {
   return err
  }

  // Wait for pods to be scaled
  for {
   podList, err := clientSet.CoreV1().Pods(ns).List(ctx, metav1.ListOptions{
    LabelSelector: fmt.Sprintf("app=%s", deployment.Name),
   })
   if err != nil {
    log.Fatal(err)
   }
   if len(podList.Items) != int(scale.Spec.Replicas) {
    time.Sleep(time.Second * 10)
   } else {
    break
   }
  }

  // Wait for pods to be running
  for {
   podList, err := clientSet.CoreV1().Pods(ns).List(ctx, metav1.ListOptions{
    LabelSelector: fmt.Sprintf("app=%s", deployment.Name),
   })
   if err != nil {
    log.Fatal(err)
   }
   isPending := false
   for _, item := range podList.Items {
    if item.Status.Phase != v1.PodRunning {
     log.Printf("Deployment: %s Pod: %s Not Running Status: %s\\
", deployment.Name, item.Name, item.Status.Phase)
     isPending = true
    }
   }
   if isPending == true {
    time.Sleep(time.Second * 10)
   } else {
    break
   }
  }

  // Remove origin pod
  for _, pod := range originPodList.Items {
   err = clientSet.CoreV1().Pods(ns).Delete(context.Background(), pod.Name, metav1.DeleteOptions{})
   if err != nil {
    return err
   }
   log.Printf(" Remove origin %s success.\\
", pod.Name)
  }

  //Recover scale
  newScale, err := clientSet.AppsV1().Deployments(ns).GetScale(ctx, deployment.Name, metav1.GetOptions{})
  if err != nil {
   return err
  }
  newScale.Spec.Replicas = newScale.Spec.Replicas - 1
  newScale.ResourceVersion = ""
  newScale.UID = ""
  _, err = clientSet.AppsV1().Deployments(ns).UpdateScale(ctx, deployment.Name, newScale, metav1.UpdateOptions{})
  if err != nil {
   return err
  }
  log.Printf(" Depoloyment %s rebuild success.\\
", deployment.Name)
  log.Println()

 }

 return nil
}

It can be seen that there are quite a lot of codes.

Final plan

Is there a simpler way? When I communicated the above solution to the leadership, everyone was dumbfounded. It is too complicated: kubectl doesn’t have a direct rolling restart command?

? k rollout -h
Manage the rollout of one or many resources.

Available Commands:
  history view rollout history
  pause Mark the provided resource as paused
  restart Restart a resource
  resume Resume a paused resource
  status Show the status of the rollout
  undo Undo a previous rollout

kubectl rollout restart deployment/abcUse this command to rollover the abc deployment. This update operation occurs on the kubernetes server. Execute The steps are similar to the second option, except that Kubernetes implements it more rigorously than mine.

Later, I also mentioned this command in the official upgrade guide of Istio:

So you still have to read the official documentation carefully.

Integrate kubectl

Now that there is a ready-made command, just integrate this command into my script, and then call it in a loop when traversing the deployment under the namespace.

But this rollout command does not have this API in the SDK of kubernetes‘s client-go.

So I can only refer to the source code of kubectl and copy this part of the function; but fortunately, I can directly rely on kubect in my project.

require (
    k8s.io/api v0.28.2
    k8s.io/apimachinery v0.28.2
    k8s.io/cli-runtime v0.28.2
    k8s.io/client-go v0.28.2
    k8s.io/klog/v2 v2.100.1
    k8s.io/kubectl v0.28.2
)

image.png

The RestartOptions structure used in the source code is publicly accessible, so I modified it with reference to its source code:

func TestRollOutRestart(t *testing.T) {
    kubeConfigFlags := defaultConfigFlags()
    streams, _, _, _ := genericiooptions.NewTestIOStreams()
    ns := "dev"
    kubeConfigFlags.Namespace = &ns
    matchVersionKubeConfigFlags := cmdutil.NewMatchVersionFlags(kubeConfigFlags)
    f := cmdutil.NewFactory(matchVersionKubeConfigFlags)
    deploymentName := "deployment/abc"
    r := & amp;rollout.RestartOptions{
       PrintFlags: genericclioptions.NewPrintFlags("restarted").WithTypeSetter(scheme.Scheme),
       Resources: []string{deploymentName},
       IOStreams: streams,
    }
    err := r.Complete(f, nil, []string{deploymentName})
    if err != nil {
       log.Fatal(err)
    }
    err = r.RunRestart()
    if err != nil {
       log.Fatal(err)
    }
}

Finally, after several debugs, it finally works. You only need to move this part of the logic into a loop and add sleep to restart the Pod regularly.

Reference links:

https://istio.io/latest/docs/setup/upgrade/canary/#data-plane
https://github.com/kubernetes/kubectl/blob/master/pkg/cmd/rollout/rollout_restart.go

Recommended in the past

Implementing gRPC load balancing in a kubernetes environment

Technical Reading Weekly Issue 1

Some tips for using Helm to manage applications

Five minutes from getting started with k8s to practice – application configuration