kuernetes resource object analysis error

Article directory

- 1. pod status
- - 1.1 Types of container startup errors
  - 1.2 ImagePullBackOff error
  - 1.3 CrashLoopBackOff
  - 1.4 Pending
- 2. Service connection status
- 3. Ingress connection status

1. pod status

Create a pod-status.yaml

apiVersion: v1
kind: Pod
metadata:
  name: running
  labels:
    app: nginx
spec:
  containers:
    - name: web
      image: nginx
      ports:
        - name: web
          containerPort: 80
          protocol: TCP

---
apiVersion: v1
kind: Pod
metadata:
  name: backoff
spec:
  containers:
    - name: web
      image: nginx:not-exist

---
apiVersion: v1
kind: Pod
metadata:
  name: error
spec:
  containers:
    - name: web
      image: nginx
      command: ["sleep", "a"]

Then, use the kubectl apply command to apply it to the cluster.

$ kubectl apply -f pod-status.yaml
pod/running created
pod/backoff created
pod/error created

Next, use kubectl get pods to view the 3 Pods you just created.

NAME READY STATUS RESTARTS AGE
backoff 0/1 ImagePullBackOff 0 50s
error 0/1 CrashLoopBackOff 1 4s
running 1/1 running 0 50s

In this example, we have created 3 Pods in total, and their states include ImagePullBackOff, CrashLoopBackOff and Running.

1.1 Types of container startup errors

ErrImagePull
ImageInspectError
ErrImageNeverPull
Registry Unavailable
InvalidImageName

1.2 ImagePullBackOff error

Caused by:

The image name or version is wrong. In the backoff Pod we just created, the image we specified is nginx:not-exist, but the actual image version is not- exist does not exist, and naturally an error will be thrown.
A private image was specified, but no pull credentials were provided.

We can use the kubectl describe command to see the details of the error.

$ kubectl describe pod backoff
Events:
  Type Reason Age From Message
  ---- ------- ---- ---- -------
  Normal Scheduled 10m default-scheduler Successfully assigned default/backoff to kind-control-plane
  Normal Pulling 8m43s (x4 over 10m) kubelet Pulling image "nginx:not-exist"
  Warning Failed 8m40s (x4 over 10m) kubelet Failed to pull image "nginx:not-exist": rpc error: code = NotFound desc = failed to pull and unpack image "docker.io/library/nginx:not-exist ": failed to resolve reference "docker.io/library/nginx:not-exist": docker.io/library/nginx:not-exist: not found
  Warning Failed 8m40s (x4 over 10m) kubelet Error: ErrImagePull
  Warning Failed 8m11s (x6 over 10m) kubelet Error: ImagePullBackOff
  Normal BackOff 4s (x42 over 10m) kubelet Back-off pulling image "nginx:not-exist"

From the third line in the Event event in the returned result, we can find that the cluster has thrown the exception of nginx:not-exist: not found, so we have located specific error.

1.3 CrashLoopBackOff

CrashLoopBackOff is a typical container runtime error. In addition, you may also see similar RunContainerError errors. There are two main reasons for this error.

An error occurred when the application in the container was started, for example, the failure to read the configuration caused it to fail to start.
Misconfiguration, such as misconfigured container start command.

In the example of the error Pod created above, I intentionally misconfigured the start command of the container so that we also see the CrashLoopBackOff exception. For the errors in the running phase, most of the errors come from the startup phase of the business itself. Therefore, we only need to check the logs of the Pod to find the problem. For example, let’s try to view the logs of the error Pod.

$ kubectl logs error
sleep: invalid time interval 'a'
Try 'sleep --help' for more information.

From the returned log, the sleep command throws an exception, that is, the parameter error. In a production environment, we generally use Deoloyment workloads to manage Pods. When Pods run abnormally, the Pod name will change with restarts. At this time, you can add - when viewing logs -previous parameter to view previous Pod logs.

$ kubectl logs pod-name --previous

1.4 Pending

Occasionally, you might not see an error state of being up and running, but when looking at the status, you will see the Pod is in the Pending state. You can try to save the following content as pending-pod.yaml file, and deploy this example to the cluster via kubectl apppy -f .

apiVersion: v1
kind: Pod
metadata:
  name: pending
spec:
  containers:
    - name: web
      image: nginx
      resources:
        requests:
          cpu: 32
          memory: 64Gi

Next, try to check the status of the Pod.

$ kubectl get pods
NAME READY STATUS RESTARTS AGE
pending 0/1 pending 0 15s

From the returned results, we will find that the Pod does not throw any exception, but its status is Pending, and READY 0/1 indicates that the Pod is not ready to receive external traffic. There may be the following three main reasons for the Pending status.

The cluster resources are insufficient to schedule the Pod.
Pod is waiting for PVC persistent storage volume.
Pod resource usage exceeds namespace resource quota.

In the above example, we have configured a resource request quota of 32 cores and 64G for the Pod, which obviously exceeds the cluster resources. At this point, the Pod will be in the Pending state, and Kubernetes will always try to schedule it. Once a new node is added and the resource requirements are met, the Pod will be restarted. The Pending state is actually a case of abnormal container startup, but it is not an error, but it is temporarily unschedulable. To find out the specific reason for the Pending status, you can refer to the method of finding container startup errors and use the kubectl describe command to view it.

$ kubectl describe pod pending
Events:
  Type Reason Age From Message
  ---- ------- ---- ---- -------
  Warning FailedScheduling 11m default-scheduler 0/1 nodes are available: 1 Insufficient cpu, 1 Insufficient memory. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
  Warning FailedScheduling 6m45s default-scheduler 0/1 nodes are available: 1 Insufficient cpu, 1 Insufficient memory. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.

From the Event event that returns the result, we can conclude that the reason for Pending is that there is no Node node that meets the CPU resource conditions. Kubernetes tried to schedule twice, and the exception was the same.

2. Service connection status

Sometimes, even though the Pod is running and ready, we cannot request the business service from the outside. At this time, it is necessary to pay attention to the connection status of the Service. Service is the core component of Kubernetes, and it is available under normal circumstances. In a production environment, traffic generally flows from Ingress to Service to Pod. Therefore, when the business service of the Pod cannot be accessed externally, we can first check from the innermost layer, that is, the Pod. The easiest way is to directly connect to the Pod and initiate a request to check whether the Pod can work normally.

To access the Pod locally, we can use kubectl port-forward for port forwarding, taking the Nginx Pod we just created as an example.

$ kubectl port-forward pod/running 8081:80

If the request to access port 8081 locally is successful, it means that the Pod and the business level are normal. Next, we further check the connection status of the Service. Similarly, the easiest way is to directly connect to the Service to initiate a request through port forwarding.

$ kubectl port-forward service/<service-name> local_port:service_pod

If the requested Service can return the content correctly, it means that the Service layer is also normal. If content cannot be returned, there are usually two possible reasons for this.

The Service Selector selector did not match the Pod correctly.
Service’s Port and TargetPort are misconfigured.

By fixing these two configurations, you should be able to fix the Service to Pod connectivity issues.

3. Ingress connection status

At this point, if you still cannot access business services from Ingress, you need to continue to troubleshoot Ingress. First, verify that the Pods for the Ingress controller are running.

$ kubectl get pods -n ingress-nginx
NAME READY STATUS RESTARTS AGE
ingress-nginx-controller-8544b85788-c9m2g 1/1 Running 6 (4h35m ago) 1d

After confirming that there is no abnormality in the Ingress controller, it can basically be confirmed that the failure is caused by a wrong configuration of the Ingress policy. You can view the Ingress policy with the kubectl describe ingress command.

$ kubectl describe ingress ingress_name
Name: ingress_name
Namespace: default
Rules:
  Host Path Backends
  ---- ---- --------
              /running-service:80 (<error: endpoints "running-service" not found>)