Starting from docker v0.1.0

Overview of this article

This article tells the author’s mental journey in the process of learning docker. I first got to know docker from the overall framework of docker, then started reading the v0.1.0 version of the docker source code, and then followed the tutorial to implement a mini docker using go, and finally reviewed it. Docker’s source code, in summary, is a cyclic learning process of theory and practice where theory guides practice and practice deepens theory.

First introduction to docker

Docker is a set of platform as a service(PaaS) products that use OS-level virtualization to deliver software in packages called containers.

This is the definition of docker on Wikipedia. This sentence tells us what docker is from a macro perspective. It is indeed difficult to find a more accurate definition than this. However, what does OS-level virtualization technology refer to? Why do we need docker? What is the use of containers?

A related concept before coming into contact with docker is a virtual machine, which can simulate the functions of a complete hardware system and run a complete computer system in a completely isolated environment. Like docker, it is a virtualization technology. The difference is:

Docker is a package of Linux containers that provides a simple and easy-to-use container usage interface. It is not a complete operating system, it is just equivalent to a shell wrapped around the normal process. The various resources that the process in the container comes into contact with are all virtual, but they are essentially calling the underlying system. However, effective isolation from the underlying system is achieved through technologies such as namespaces.
A virtual machine implements virtualization by running a complete operating system on a physical host. Each virtual machine has its own operating system kernel, system processes, and device drivers.

To put it simply, many times we do not need an entire operating system environment to run some programs. We only need certain interfaces to work. From the perspective of resources and running time, there is no need to simulate the entire operating system (virtual machine). , only one part (container) needs to be simulated.

We have made it clear that this thing is useful, so how to use it?

Docker has three basic concepts: Image, Container, and Repository. In fact, we can use the concept of code library to make an analogy. A warehouse is like a public or private warehouse such as github, gitlab, etc. The image is like a warehouse in the code, which is unchanged, and the container is where I pull the code in the warehouse. The local version can be modified locally, or such modifications can be transferred to the warehouse image through certain operations. Of course, the relationship between images and containers is more like that between classes and instances. The process of using docker is to obtain the corresponding warehouse image based on clear requirements, create a local container, and modify the container according to the specific requirements of the business.

So how is this implemented? With such a virtualization technology and such a process from image to container, how does docker do it?

This answer can actually be found in the docker architecture. Docker obeys the C/S architecture. Clinet is the client for users to communicate with Docker Daemon, that is, the command line terminal, which packages commands to send api requests. The Docker server has a coupled structure, and each module performs its own duties and is combined organically. Among them, daemon is a process that runs in the background, accepts client requests and manages docker containers, and engine is the back-end request that actually handles client requests. Specifically:

The docker client establishes communication with the daemon, and the client sends a request to the daemon.
As the main part, daemon provides server functions, allowing it to accept client requests.
The engine performs a series of internal tasks
Each job exists in the form of a job. When an image is needed, it is downloaded from the docker registry, and the image is downloaded and stored (in Graph form) through the graphdrive image management driver.
networkdrive is responsible for creating and configuring the container’s network
When running user instructions or limiting container resources, this is done through execdrive
execdrive and networkdrive are implemented through libcontainer

From image to container: what happens to the docker run command?

(1) Docker Client accepts the docker run command and sends an http request to the server; (2) Docker Server accepts the above HTTP request and passes it to mux.Router, which determines the specific handler that executes the request through the URL and request method. ; (3) mux.Router distributes the request route to the corresponding handler, specifically PostContainersCreate; (4) create job is created in PostContainersCreate (5) create job calls graphdriver and loads the rootfs image to the corresponding location of the docker container (6) If the above process is correct, create a start job similar to the above process and hand it over to networkdrive for implementation, stop the creation of the container and finally execute the command required by the user to start. See reference link for details

It can be found that the communication between the client and the server is essentially the communication between the internal components of the server. After receiving the client instruction, the server is responsible for distributing the command to the corresponding handle and creating the corresponding job for the corresponding handle, while the engine calls different post-processing tasks according to different jobs. Use the end component to implement the job, and then return the results to the client after implementing the job.

OS-level virtualization support for containers is mainly provided by libcontainer, which is a library designed and implemented in the Docker architecture using the Go language. The original design intention is that the library can directly access the container-related APIs in the kernel without relying on any dependencies. . It is precisely because of the existence of libcontainer that Docker can directly call libcontainer and ultimately manipulate the container’s namespace, cgroups, apparmor, network devices, and firewall rules. The completion of this series of operations does not require relying on LXC or other packages.

Starting from docker v0.1.0

Now that you have basically understood the architecture of docker and some of its modules, the best way to learn more about it is naturally to read the source code. The current version of the moby warehouse on github has a large amount of code and is not easy to read, so from Starting from the v0.1.0 version of docker, there are not many code files, and it already has the core functions of docker.

The entry function is located in /docker/docker.go

// ./docker/docker.go
func main() {<!-- -->
if docker.SelfPath() == "/sbin/init" {<!-- -->
// Running in init mode
docker.SysInit()
return
}
// FIXME: Switch d and D ? (to be more sshd like)
fl_daemon := flag.Bool("d", false, "Daemon mode")
fl_debug := flag.Bool("D", false, "Debug mode")
flag.Parse()
rcli.DEBUG_FLAG = *fl_debug
if *fl_daemon {<!-- -->
if flag.NArg() != 0 {<!-- -->
flag.Usage()
return
}
if err := daemon(); err != nil {<!-- -->
log.Fatal(err)
}
} else {<!-- -->
if err := runCommand(flag.Args()); err != nil {<!-- -->
log.Fatal(err)
}
}
}

As you can see, when the code runs, it first determines whether the absolute path of the docker executable file is in the /sbin/init directory. If it is, set the environment before the docker container is started and initialize it. If it does not exist, choose whether to start docker deamon or execute the command call of docker cli according to the entered command line parameters.

SysInit

The system’s initialization settings are located in /sysinit.go

//sysinit.go
func SysInit() {<!-- -->
if len(os.Args) <= 1 {<!-- -->
fmt.Println("You should not invoke docker-init manually")
os.Exit(1)
}
var u = flag.String("u", "", "username or uid")
var gw = flag.String("g", "", "gateway address")

flag.Parse()

setupNetworking(*gw)
changeUser(*u)
executeProgram(flag.Arg(0), flag.Args())
}

which contains

Network settings: Add a gateway route based on the gateway ip in the parameters
User settings: Set uid and gid through system call according to username or uid in the parameter
Start the docker program: Start the docker program and decide whether to start the docker deamon or docker cli command based on the command line parameters.

daemon

If the output parameter is -d, start the daemon

// ./docker/docker.go
func daemon() error {<!-- -->
// NewServer in commands.go
service, err := docker.NewServer()
if err != nil {<!-- -->
return err
}
return rcli.ListenAndServe("tcp", "127.0.0.1:4242", service)
}

Starting the daemon is to create a server pair and then enable the tcp service through this object. Creating a server is actually creating a runtime object.

// command
func NewServer() (*Server, error) {<!-- -->
rand.Seed(time.Now().UTC().UnixNano())
if runtime.GOARCH != "amd64" {<!-- -->
log.Fatalf("The docker runtime currently only supports amd64 (not %s). This will change in the future. Aborting.", runtime.GOARCH)
}
runtime, err := NewRuntime()
if err != nil {<!-- -->
return nil, err
}
srv := & amp;Server{<!-- -->
runtime: runtime,
}
return srv, nil
}

When creating the runtime, first create the corresponding files: containers and graph folders in the /var/lib/docker directory, then create the corresponding image tag storage object, create network management through the network of the card named lxcbr0, and finally create Dockerhub’s authentication object AuthConfig, see code comments for details.

func NewRuntime() (*Runtime, error) {<!-- -->
return NewRuntimeFromDirectory("/var/lib/docker")
}

func NewRuntimeFromDirectory(root string) (*Runtime, error) {<!-- -->
runtime_repo := path.Join(root, "containers")

//Create the /var/lib/docker/containers directory
if err := os.MkdirAll(runtime_repo, 0700); err != nil & amp; & amp; !os.IsExist(err) {<!-- -->
return nil, err
}
//Create the /var/lib/docker/graph directory and create the Graph object at the same time
g, err := NewGraph(path.Join(root, "graph"))
if err != nil {<!-- -->
return nil, err
}
//Create the var/lib/docker/repositories directory and create the TagStore object
repositories, err := NewTagStore(path.Join(root, "repositories"), g)
if err != nil {<!-- -->
return nil, fmt.Errorf("Couldn't create Tag store: %s", err)
}
// Create network management through the network of the card named lxcbr0
netManager, err := newNetworkManager(networkBridgeIface)
if err != nil {<!-- -->
return nil, err
}
//Read the authentication file
authConfig, err := auth.LoadConfig(root)
if err != nil & amp; & amp; authConfig == nil {<!-- -->
// If the auth file does not exist, keep going
return nil, err
}
// Create runtime object
runtime := & amp;Runtime{<!-- -->
root: root, // /var/lib/docker
repository: runtime_repo, // /var/lib/docker/containers
containers: list.New(), // container/list(list.New()) doubly linked list
networkManager: netManager, // NetworkManager
graph: g, // Graph
repositories: repositories, // TagStore
authConfig: authConfig, // AuthConfig
}
// Read the /var/lib/docker/containers directory, which is actually the directory of all previously run containers.
    // Check whether the id in the configuration is the same as the loaded container id to determine whether the container information has been changed.
if err := runtime.restore(); err != nil {<!-- -->
return nil, err
}
return runtime, nil
}

After creating the service, configure the tcp server

// ./rcli/tcp.go
func ListenAndServe(proto, addr string, service Service) error {<!-- -->
    //Create listener
listener, err := net.Listen(proto, addr)
if err != nil {<!-- -->
return err
}
log.Printf("Listening for RCLI/%s on %s\
", proto, addr)
defer listener.Close()
for {<!-- -->
        //Accept tcp request
if conn, err := listener.Accept(); err != nil {<!-- -->
return err
} else {<!-- -->
go func() {<!-- -->
if DEBUG_FLAG {<!-- -->
CLIENT_SOCKET = conn
}
                // handle the request
if err := Serve(conn, service); err != nil {<!-- -->
log.Printf("Error: " + err.Error() + "\
")
fmt.Fprintf(conn, "Error: " + err.Error() + "\
")
}
conn.Close()
}()
}
}
return nil
}

The request processing process is shown in /rcli/type.go LocalCall(). Specifically, it obtains the parameters in the request and then calls call. The call executes different methods according to whether the parameters have values. If there are no parameters, then Execute the runtime help method; if there are parameters, process the parameters and process the logic: get the second parameter as the command after docker, then get all the parameters after the command, print the entire command in the log, and then use cmd The command and reflection technology find the method corresponding to the corresponding cmd. Finally, the parameters are passed into the method, the method corresponding to the cmd is executed, and the result is returned to connect. connect is used here as an io.Writer type parameter, and the command results will be written into it.

runCommand

runCommand is client mode, see code comments for details

func runCommand(args []string) error {<!-- -->
var oldState *term.State
var err error
    // Check the standard input mode and make sure the environment variable NOARW is not empty
if term.IsTerminal(0) & amp; & amp; os.Getenv("NORAW") == "" {<!-- -->
oldState, err = term.MakeRaw(0)
if err != nil {<!-- -->
return err
}
defer term.Restore(0, oldState)
}
// FIXME: we want to use unix sockets here, but net.UnixConn doesn't expose
// CloseWrite(), which we need to cleanly signal that stdin is closed without
// closing the connection.
// See http://code.google.com/p/go/issues/detail?id=3345
    // TCP connects to the server and passes parameters
if conn, err := rcli.Call("tcp", "127.0.0.1:4242", args...); err == nil {<!-- -->
        // Start a goroutine to receive stdout from the connection and write to os.Stdout
receive_stdout := docker.Go(func() error {<!-- -->
_, err := io.Copy(os.Stdout, conn)
return err
})
         // Start a goroutine to send os.Stdin data to the connection
send_stdin := docker.Go(func() error {<!-- -->
_, err := io.Copy(conn, os.Stdin)
if err := conn.CloseWrite(); err != nil {<!-- -->
log.Printf("Couldn't send EOF: " + err.Error())
}
return err
})
if err := <-receive_stdout; err != nil {<!-- -->
return err
}
if !term.IsTerminal(0) {<!-- -->
if err := <-send_stdin; err != nil {<!-- -->
return err
}
}
} else {<!-- -->
        // If the connection fails, create a local docker server
service, err := docker.NewServer()
if err != nil {<!-- -->
return err
}
        // Use the local docker server to call rcli.LocalCall to execute the command and pass the standard input and standard output to the docker server.
if err := rcli.LocalCall(service, os.Stdin, os.Stdout, args...); err != nil {<!-- -->
return err
}
}
if oldState != nil {<!-- -->
term.Restore(0, oldState)
}
return nil
}

Summary of this chapter

It can be found from the source code that docker v0.1.0 is completely based on lxc. In its initialization phase, it first switches users across the board, adds functions such as bridge default routing, and then creates the server and client. The server is managed through the docker daemon, the relevant container data is loaded into the runtime object, a service is created to listen for client requests, and The corresponding module is called for processing according to the corresponding request.

Hands-on implementation of minidocker

What I have learned on paper is ultimately shallow, and I know that I have to do it in detail. The best way to understand docker is of course to write one yourself. Mainly refer to the tutorial to implement docker’s virtualization function, including namspace, cgroups, file system and network configuration.

Experimental environment

Linux version 6.1.55-1-MANJARO + go1.21.2 linux/amd64

Namespace

 cmd := exec.Command("/bin/zsh")
cmd.SysProcAttr = & amp;syscall.SysProcAttr{<!-- -->
Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID | syscall.CLONE_NEWNS | syscall.CLONE_NEWNET | syscall.CLONE_NEWIPC,
}
cmd.Env = os.Environ()
cmd.Stdin = os.Stdin
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
err := cmd.Run()
if err != nil {<!-- -->
log.Fatal(err)
}

Start a new shell and configure process attributes through syscall.SysProAttr to create a new namespace:

CLONE_NEWUTS: Isolate system identifiers such as host names and domain names. The specific performance is as follows: changing hostname in the container will not affect the hostname hostname newname
CLONE_NEWPID: Quarantine process ID. The specific performance is: echo $$ in the container to view the current pid number, starting from 1. But pd -ef can still see the processes on the host:
- Solution: mount -t proc proc /proc top is indeed invisible, but the parent node is down and has been uninstalled.
- First mount -make-rprivate / and then mount -t proc proc
CLONE_NEWNS: New Mount namespace for isolating file system mount points.
CLONE_NEWNET: New network namespace for isolating network devices. The specific performance is route -n. If there is no IP or routing table, it needs to be reconfigured.
CLONE_NEWIPC: New IPC namespace for isolating inter-process communication resources. The specific performance is that the host ipcmk -Q creates a message queue and the ipcs in the container cannot be seen.

Through experiments with the li nu x command, it can be found that various isolations have been achieved and the understanding of namespaces has been improved.

But there is still a problem, even if the file system within the host is used, pwd can still see the host.

File system

Download the file system https://cdimage.ubuntu.com/ubuntu-base/releases/16.04/release/ and mount it.

 switch os.Args[1] {<!-- -->
case "run":
fmt.Println("run mode: run pid", os.Getpid(), "ppid", os.Getppid())
initCmd, err := os.Readlink("/proc/self/exe") //Read the path of the current process executable file
if err != nil {<!-- -->
fmt.Println("get init process error", err)
return
}
        
os.Args[1] = "init"
cmd := exec.Command(initCmd, os.Args[1:]...)

cmd.SysProcAttr = & amp;syscall.SysProcAttr{<!-- -->
Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID | syscall.CLONE_NEWNS | syscall.CLONE_NEWNET | syscall.CLONE_NEWIPC,
}
cmd.Env = os.Environ()
cmd.Stdin = os.Stdin
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
err = cmd.Run()
if err != nil {<!-- -->
log.Fatal(err)
}
return
case "init":
fmt.Println("init mode: run pid", os.Getpid(), "ppid", os.Getppid())
pwd, err := os.Getwd()
fmt.Println("pwd:", pwd)
if err != nil {<!-- -->
fmt.Println("pwd", err)
return
}
path := pwd + "/ubuntu"
syscall.Mount("", "/", "", syscall.MS_BIND|syscall.MS_REC, "")
if err := syscall.Mount(path, path, "bind", syscall.MS_BIND|syscall.MS_REC, ""); err != nil {<!-- -->
fmt.Println("Mount", err)
return
}
if err := os.MkdirAll(path + "/.old", 0700); err != nil {<!-- -->
fmt.Println("mkdir", err)
return
}
//syscall.PivotRoot will report an error invalid argument. You can first execute the unshare -m command and then delete the ubuntu/.old folder.
//The reason is that systemd will modify fs to shared, and pivot root does not allow parent mount point and new mount point to be shared.
//Reference: https://www.retainblog.top/2022/10/26/Use Golang to implement your own Docker (2)/

err = syscall.PivotRoot(path, path + "/.old")
if err != nil {<!-- -->
fmt.Println("pivot root", err)
return

}
//syscall.Chroot("./ubuntu-base-16.04.6-base-amd64") //Chroot switches the process file system. The root file system of the process changes but the namespace of the process does not change.
syscall.Chdir("/")

defaultMountFlags := syscall.MS_NOEXEC | syscall.MS_NOSUID | syscall.MS_NODEV
syscall.Mount("proc", "/proc", "proc", uintptr(defaultMountFlags), "")

cmd := os.Args[2]
err = syscall.Exec(cmd, os.Args[2:], os.Environ())
if err != nil {<!-- -->
fmt.Println("exec proc fail", err)
return
}
fmt.Println("forever exec it")
return
}

fmt.Println("hello world")
}

Add the init mode to configure the container. In addition, running multiple container processes will use the same root file system directory. Modification of the container will change the use of the Lianhe file system. After running, two container spaces will appear in /root/mnt, which will be automatically destroyed after running. .

Network configuration

The internal network of the container cannot access the Internet, which is achieved through a bridge. The specific internal configuration of Linux is that minidocker is implemented by calling the code of the configuration of the system interface.

# Allow firewall routing forwarding
iptables -A FORWARD -j ACCEPT
# The kernel allows routing forwarding, change the value to 1
sudo bash -c 'echo 1 > /proc/sys/net/ipv4/ip_forward'


# Create bridge
brctl addbr br0
#Set the bridge opening status
sudo ip link set br0 up #After opening, you can see the bridge through ifconfig but there is no ip address
# Assign ip address to bridge
ip addr add 192.168.15.6/24 dev br0

#Create veth devices. Veth devices generally appear in pairs.
sudo ip link add veth-red type veth peer name veth-red-br
sudo ip link add veth-bule type veth peer name veth-blue-br

# Enable the veth network card on the host
sudo ip link set veth-red-br up
sudo ip link set veth-blue-br up # ifconfig visible

# Put veth-red into the red namespace. The network namespace of the process. Start the process first to view the process pid.
sudo ip link set veth-red netns process id
sudo ip link set veth-blue netns process id

# One end of the veth device is connected to the bridge
sudo ip link set veth-red-br master br0
sudo ip link set veth-blue-br master br0

#Add ip to the veth device in the container
sudo ip link set veth-red up
sudo ip addr add 192.168.15.5/24 dev veth-red # After setting, you can see veth-red route -n in ifconfig. The routing table is also configured with the 192.168.15.0 network segment.

sudo ip link set veth-blue up
sudo ip addr add 192.168.14.7/24 dev veth-blue

# Now the two containers can access but cannot access the external network. They need to be forwarded to the eth0 network card through the bridge.
#Add gateway route inside the container
ip route add defaule via 192.168.15.6 dev veth-red
ip route add default via 192.168.15.6 dev veth-blue

# But after the network package goes out, it still needs to be
# Firewall nat settings
iptables -t nat -A POSTROUTING -s 192.168.15.0/24 -j MASQUEREAD

cgroups

Resource isolation was previously achieved through namespaces, but this isolation could not isolate the use of hardware resources. cgroups implemented limits on CPU usage. The kernel is basically completed, and only the relevant interfaces need to be called.

cd /sys/fs/cgroup/ # The folder contains cpu memory, etc., which is a subsystem of cgroup

# cpu.cfs_period_us represents the length of time the CPU runs for one cycle
# Modify the memory_limit_in_bytes file to set the program's limited memory size

The implementation of cgroups is to modify the internal resource management files of the container through the system interface.

Exploring the docker source code again

It can be found that docker v0.1.0 version basically implements the functions, but in fact there are still many problems, or lack of function points. For example, the aufs file system is used, which is not fully supported in the 2.6.32 Linux kernel. Subsequent versions will mainly focus on supplementary optimization of its isolation, security, stability, functions, etc.

for example:

Moving from LXC to its own libcontainer: In subsequent versions, Docker shifted from relying on LXC to its own container runtime interface libcontainer (now evolved into containerd), which improved security and portability.
Security enhancements: More security features are introduced, such as AppArmor, SELinux policy support, and later user namespaces, which increase the isolation and security of containers.
Network and storage drivers: Introducing pluggable network and storage drivers to support multiple network configurations and persistent storage options.
Orchestration and Scaling: Introducing tools such as Docker Compose and Docker Swarm to provide support for container orchestration and expansion.
Interface and API: The command line interface (CLI) has been improved and a REST API has been added to provide stronger support for automation and integration.
Openness and standardization: Docker began to participate in and promote open container standards, such as the Open Container Initiative (OCI).

Question

It can be found that docker is essentially implemented based on Linux namespace and cgroup interfaces, but there is also docker for windows. How is it implemented?

It can be found that the structures of the two are very similar. Similar to Linux, Windows also newly abstracts the concepts of CGroup and Namespace, and provides a new abstraction level Compute Service, namely Host Compute Service (hcs). Compared with the underlying implementation details that may be frequently refactored, hcs aims to provide a more stable operating interface for external parties (such as the Docker engine).

So, essentially, the problem is solved by adding abstraction.

Summary

Docker is a set of platform as a service(PaaS) products that use OS-level virtualization to deliver software in packages called containers.

Let’s go back and look at Wikipad’s definition of docker. It’s really good. Containers are isolation. Docker is an indispensable product in the cloud native era. Going to the cloud requires virtualization, and virtualization requires docker. This lightweight, Scalable container technology can effectively avoid reinventing the wheel, greatly improve traditional environment configuration problems, and enable software services to be plug-and-play.

Reference link

https://github.com/moby/moby

http://en.wikipedia.org/wiki/Docker_(software)

https://learn-docker-the-hard-way.readthedocs.io/zh-cn/latest/Part1/1_docker.go/

https://zhuanlan.zhihu.com/p/25773225

https://zhuanlan.zhihu.com/p/551753838

https://www.kancloud.cn/infoq/docker-source-code-analysis/80525

https://www.imooc.com/article/335433vo

https://www.yzktw.com.cn/post/1281202.html

https://insights.thoughtworks.cn/can-i-use-docker-on-windows/**# Starting from docker v0.1.0