Docker containers and virtualization technology: container runtime explanation and comparison

Table of Contents

1. Theory

1. Container runtime

2. Container runtime interface

3. Container runtime level

4. Container runtime comparison

5. Strong isolation container

2. Problems

1. Why is it difficult for K8S to achieve true multi-tenancy

3. Summary


1. Theory

1.Container runtime

(1) Concept

Container Runtime runs on each node of the k8s cluster and is responsible for the entire life cycle of the container. Docker is currently the most widely used. With the development of container cloud, many container runtimes have emerged. In order to decouple kubelet from specific container runtimes (mainly to get rid of Docker), Google launched CRI (Container Runtime Interface).

2. Container runtime interface

CRI is a set of gRPC services defined by k8s. As a client, kubelet is based on the gRPC framework and communicates with the container runtime through Socket. CRI includes two types of services: Image Service and Runtime Service. The image service provides remote program calls for downloading, checking, and deleting images. The runtime service is used to manage the life cycle of the container and the calls to interact with the container (exec/attach/port-forward).

3. Container runtime level

Container Runtime is divided into two levels: high and low.

(1) High-level runtime
Dockershim, containerd, and CRI-O are all CRI-compliant container runtimes, which belong to high-level runtimes and mainly provide gRPC calls to the outside world.

Note that this is Dockershim, not Docker, and Docker has not followed the CRI so far.

OCI
OCI (OPen Container Initiative) defines open source industry standards for creating container formats and runtimes, including image specifications and runtime specifications.

The high-level runtime downloads an OCI image and unpacks it into an OCI runtime filesystem bundle.

(2) Low-level runtime
The low-level runtime defines how to set up Linux namespaces and cgroups for new containers, as well as operations such as rootfs, and runC is the specific reference implementation. In addition to runC, there are many other runtimes that follow the OCI standard, such as kata and gVisor.

4. Container runtime comparison

Docker’s multi-layer packaging and calling make it slightly inferior in maintainability. The solutions of containerd and CRI-O are much simpler than Docker.

dockershim follows CRI and converts requests into requests that dockerd can handle. Its code is integrated into kubelet. This is one of the reasons why k8s is eager to get rid of Docker.

The actual starting of the container is to call runC through containerd-shim to start the container. runC will exit directly after the startup is completed. containerd-shim will become the parent process of the container process, responsible for collecting the status of the container process, reporting it to containerd, and in the container After the process with pid 1 exits, it takes over the child process in the container to ensure that no zombie process will appear. At the same time, it also avoids that if the containerd process on the host machine hangs up, all container processes will exit.

(1) Differences in details between containerd and Docker

When Docker runs as a container, k8s actually does not use docker’s own storage, network and other functions at all, but uses Docker’s image function to satisfy the image service in CRI.

(2)containerd and CRI-O

CRI-O is a container runtime initiated and open sourced by Red Hat. It is relatively new and does not have much production practice. Moreover, in the test results of the community, the performance and latency of operating containers are not as good as containerd.

5. Strong isolation container

(1) Commonly used strong isolation containers

Kata, gVisor, firecracker

(2) Secure containers and Serverless

In order for Serverless to enable all user containers or functions to use computing resources on demand, two points must be met:

Multi-tenant strong isolation: User containers or functions are started on demand and billed per second. We cannot pre-allocate a bunch of isolated resources to each user, so we need to ensure that the entire Platform is multi-tenant and strongly isolated ;
Extremely lightweight: The first feature of Serverless is that sandboxes will be created and destroyed more frequently during runtime. The second feature is that the granularity of segmentation will be very, very fine. Fine, medium and fine is FaaS, and a function requires a sandbox . Therefore, two points are required: 1. The sandbox startup and deletion must be fast; 2. The less resources the sandbox occupies, the better. 

(3)Kata Containers

① Concept

Kata Containers is an open source project of the OpenStack Foundation as part of its recently expanded charter to include OpenStack core projects. This project will certainly promote standardization and innovation, thereby driving the rapid development of container technology. Nearly 20 companies have agreed to work together on Kata Containers.

Kata containers will also be integrated and compatible across multiple infrastructure and container orchestration and specification communities: Kubernetes, Docker, Open Container Initiative (OCI), Container Runtime Interface (CRI), Container Networking Interface (CNI), QEMU, KVM, HyperV and OpenStack.

② Features

The speed of containers, the security of virtual machines.

A diagram from Kata nicely explains the difference between VM-based containers and containers based on namespaces and cgroups:

Kata Containers is a novel implementation of a lightweight virtual machine that integrates seamlessly in the container ecosystem. Kata Containers are as light and fast as containers, and combine the management layer with containers, while also providing the security advantages of virtual machines.

Kata Containers is a merger of two existing open source projects: Intel Clear Containers and Hyper runV. The new project brings together the best of both technologies with a shared vision of refactoring virtualization, container-native applications, to deliver the speed of containers, and the security of virtual machines.

Kata Containers benefit from the strengths of each project. Intel Clear Containers focuses on performance (<100ms boot time) and enhanced security, while hyper runV prioritizes technology-agnostic support for many different CPU architectures and hypervisors. By merging these projects, Clear Containers delivers superior end-user experience performance and compatibility, unifies the developer community, and accelerates feature development to address future use cases.

The industry’s move to containers presents unique challenges in terms of security, with user workloads in multi-tenant untrusted environments. Kata Containers uses an open-source hypervisor as the isolation boundary for each container (or collection of containers within a container); this approach resolves a common kernel dilemma with existing bare-metal container solutions.

Kata Containers are ideal for on-demand, event-based deployments such as serverless functions, continuous integration/continuous delivery, and longer-running web server applications. Developers no longer need to know anything about the underlying infrastructure or perform any type of capacity planning before launching their container workloads. Kata Containers deliver enhanced security, scalability, and higher resource utilization while resulting in an overall simplified stack.

2. Question

1. Why is it difficult for K8S to achieve true multi-tenancy

(1) Question

k8s cannot achieve multi-tenant status, the two biggest reasons are:

1.kube-apiserver is a single instance in the entire cluster, and has no concept of multi-tenancy;

2. The default oci-runtime is runC, and the container started by runC shares the kernel. 

(2) Reason

Ideal multi-tenant status:

Ideally, the tenants of the platform should not be able to feel the existence of each other, as if each tenant monopolizes the entire platform. Specifically, I cannot see the resources of other tenants, My resources are full and cannot affect the resource usage of other tenants, nor can I attack other tenants from the network or kernel. 

(3) Solution

For the second problem, a typical solution is to provide a new OCI implementation, use VM to run containers, and implement hard isolation on the kernel. Both runV and Clear Containers have this idea. Because the two projects do very similar things, they were later merged into one project Kata Container.

3. Summary

When the runtime container runs:
High-level runtimes (Dockershim, containerd, and CRI-O) mainly provide gRPC calls to the outside world.
The low-level runtimes (runC, kata, and gVisor) define how to set up Linux namespaces and cgroups for new containers, as well as operations such as rootfs.

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge. Cloud native entry skill tree Container (docker)Install docker15050 people are learning the system