Docker: Analysis of overlay2 and solving the problem of overlay2 files being too large

Recently, when I was studying the implementation of docker, I saw such a concept: Union File System, let us introduce it first.

Union File System

Definition: The union file system (UnionFS) is a layered, lightweight and high-performance file system. It supports the modification of the file system as a submission to superimpose layers. At the same time, different directories can be mounted to the same file system. Under a virtual file system (unite several directories into a single virtual filesystem).
There are two main details:

Different directories can be mounted under the same virtual file system:
This means that when a file system is mounted, it no longer has the contents of only one directory, but multiple.
Supports the modification of the file system as a commit to superimpose layer by layer:
This is actually a bit like the working method of git. Each commit is equivalent to an increment, and the upper-level commits are organized by the lower-level commits in an incremental manner.

So what are the benefits of doing this?

We know that an image can be understood as a template for a container, and a set of images can generate multiple containers.
Then multiple containers must have the same image layer. If we copy one copy each time, the storage space requirements are quite large, and it is not conducive to subsequent integration and release, because it means that when others need your container When you commit, what you commit is a whole image, which is like when using git, when you need to obtain the update resources of origin, you need to pull a whole copy of the code.
Therefore, it is organized by means of mirror layer increments, so that different docker containers can share the same mirror layer, plus their own changes for publishing.
And docker supports a variety of UnionFS, such as aufs, overlay2 and so on.

How overlay2 works

overlay2 is mainly composed of merged, lowerdir, upperdir, and workdir.
Among them, lowerdir corresponds to the underlying file system, that is, the content of the layer “commit”, which is a read-only layer that can be shared by the upper file system upperdir. workdir can be understood as a working directory operated by overlay2, which is used to complete operations such as copy-on-write. We will talk about copy-on-write in practice.
When overlay2 is running, it will jointly mount lowerdir, upperdir and workdir to the merged directory to provide users with a “unified view”.

This is actually a schematic diagram of the overlay on the docker official website, but it can also help us understand how overlay2 works. It should be noted here that the lowerdir in overlay2 can have many layers, and only one layer is drawn here, and we can imagine it as many layers. The difference between overlay and overlay2 can be referred to below.

So how does overlay2 deal with when we want to delete or modify the content in lowerdir? Didn’t it mean that it is a read-only layer? Read-only means that it cannot be modified, but we may indeed need to modify it. Does the appearance of lowerdir not meet our usage needs?
Let’s take a practical look at how overlay2 solves these problems.

overlay2 practical operation

The following is a description of overlay2 in man mount.

Mount options for overlay
       Since Linux 3.18 the overlay pseudo filesystem implements a union mount for other filesystems.

       An overlay filesystem combines two filesystems - an upper filesystem and a lower filesystem. When a name exists in both filesystems, the
       object in the upper filesystem is visible while the object in the lower filesystem is either hidden or, in the case of directories,
       merged with the upper object.

       The lower filesystem can be any filesystem supported by Linux and does not need to be writable. The lower filesystem can even be another
       overlayfs. The upper filesystem will normally be writable and if it is it must support the creation of trusted.* extended attributes,
       and must provide a valid d_type in readdir responses, so NFS is not suitable.

       A read-only overlay of two read-only filesystems may use any filesystem type. The options lowerdir and upperdir are combined into a
       merged directory by using:

              mount -t overlay overlay \
                -olowerdir=/lower,upperdir=/upper,workdir=/work/merged

       lowerdir=directory
              Any filesystem, does not need to be on a writable filesystem.

       upperdir=directory
              The upperdir is normally on a writable filesystem.

       workdir=directory
              The workdir needs to be an empty directory on the same filesystem as upperdir.

We follow the gourd drawing and create the corresponding directories and files.

nigo@DESKTOP-95TV8LK ~/overlay2> tree
.
├── lower1
├── lower2
├── merged
├── upper
└── work

5 directories, 0 files
nigo@DESKTOP-95TV8LK ~/overlay2> echo 'I\'m file1, belong to lower1' > lower1/file1.txt
nigo@DESKTOP-95TV8LK ~/overlay2> echo 'I\'m file2, belong to lower2' > lower2/file2.txt
nigo@DESKTOP-95TV8LK ~/overlay2> echo 'I\'m file3, belong to upper' > upper/file3.txt

Now both lowerdir and upperdir have their respective files.

nigo@DESKTOP-95TV8LK ~/overlay2> tree
.
├── lower1
│ └── file1.txt
├── lower2
│ └── file2.txt
├── merged
├── upper
│ └── file3.txt
└── work

5 directories, 3 files

As you can see, we successfully mounted these directories.

nigo@DESKTOP-95TV8LK ~/overlay2 [1]> mount | grep overlay

nigo@DESKTOP-95TV8LK ~/overlay2 [0|1]> sudo mount -t overlay overlay -olowerdir=lower1:lower2,upperdir=upper,workdir=work merged/
[sudo] password for nigo:

nigo@DESKTOP-95TV8LK ~/overlay2> mount | grep overlay
overlay on /home/nigo/overlay2/merged type overlay (rw,relatime,lowerdir=lower1:lower2,upperdir=upper,workdir=work)

And the content we expected appeared in merged.

nigo@DESKTOP-95TV8LK ~/overlay2> sudo tree
.
├── lower1
│ └── file1.txt
├── lower2
│ └── file2.txt
├── merged
│ ├── file1.txt
│ ├── file2.txt
│ └── file3.txt
├── upper
│ └── file3.txt
└── work
    └── work

6 directories, 6 files

nigo@DESKTOP-95TV8LK ~/overlay2> cat merged/*
I'm file1, belong to lower1
I'm file2, belong to lower2
I'm file3, belong to upper

Changing the content belonging to upperdir will definitely be mapped to upperdir. After all, upperdir belongs to the readable and writable layer of the current layer. And what about the lowerdir mentioned above? Let’s verify it.

Modify lowerdir files

Modify file1 in lowerdir1.

nigo@DESKTOP-95TV8LK ~/overlay2> echo 'I\'m file1, belong to lower1' > lower1/file1.txt

nigo@DESKTOP-95TV8LK ~/overlay2> echo 'file1 has been changed' > merged/file1.txt

nigo@DESKTOP-95TV8LK ~/overlay2> cat merged/*
file1 has been changed
I'm file2, belong to lower2
I'm file3, belong to upper

nigo@DESKTOP-95TV8LK ~/overlay2> cat lower1/file1.txt
I'm file1, belong to lower1

nigo@DESKTOP-95TV8LK ~/overlay2> sudo tree
.
├── lower1
│ └── file1.txt
├── lower2
│ └── file2.txt
├── merged
│ ├── file1.txt
│ ├── file2.txt
│ └── file3.txt
├── upper
│ ├── file1.txt
│ └── file3.txt
└── work
    └── work

6 directories, 7 files

It can be seen that the file1.txt in merged is indeed modified by us, but the content in lowerdir remains unchanged, but a file1.txt is generated in upperdir, which is copy-on-write. This also verifies that lowerdir is a read-only layer.
copy-on-write, that is, copy-on-write, the concept is similar to that when trying to modify the page table shared by the parent and child processes after linux fork, the kernel will re-copy a page table for the child process. Here, overlay2 generates a file1.txt in upperdir.

Delete files in lowerdir

Try deleting file2.txt.

nigo@DESKTOP-95TV8LK ~/overlay2> rm merged/file2.txt
nigo@DESKTOP-95TV8LK ~/overlay2> sudo tree
.
├── lower1
│ └── file1.txt
├── lower2
│ └── file2.txt
├── merged
│ ├── file1.txt
│ └── file3.txt
├── upper
│ ├── file1.txt
│ ├── file2.txt
│ └── file3.txt
└── work
    └── work

6 directories, 7 files

nigo@DESKTOP-95TV8LK ~/overlay2> cat merged/*
file1 has been changed
I'm file3, belong to upper

nigo@DESKTOP-95TV8LK ~/overlay2> cat lower2/file2.txt
I'm file2, belong to lower2

nigo@DESKTOP-95TV8LK ~/overlay2> ll upper/file2.txt
c--------- 1 root root 0, 0 Apr 22 14:44 upper/file2.txt

Judging from the results, file2.txt was indeed “deleted” by us. But in upperdir, we see that a special character device file file2.txt is generated.

When overlay2 sees this special character device file when it is jointly mounted, it will selectively ignore the corresponding content in lowerdir.

In aufs (another UnionFS), it appears as a whiteout file (you can look up this word, which means temporary blindness, which is very vivid emm). Interested students can search it. Most of the experiments on the Internet are also about aufs.

overlay2 and Docker

Before running the experiment, please make sure that your docker uses the overlay2 driver.
You can use docker info, or view the content in /etc/docker/daemon.json.
For details, please refer to: Use the OverlayFS storage driver | Docker Documentation

nigo@DESKTOP-95TV8LK ~> docker info | grep Storage
 Storage Driver: overlay2

Let’s take ubuntu as an example and pull down three mirror layers.

nigo@DESKTOP-95TV8LK ~> docker pull ubuntu
Using default tag: latest
latest: Pulling from library/ubuntu
a70d879fa598: Pull complete
c4394a92d1f8: Pull complete
10e6159c56c0: Pull complete
Digest: sha256:3c9c713e0979e9bd6061ed52ac1e9e1f246c9495aa063619d9d695fb8039aa1f
Status: Downloaded newer image for ubuntu:latest
docker.io/library/ubuntu:latest

We can successfully find them in /var/lib/docker/overlay2. Since some directories are relatively deep, we can specify the depth of tree access through the -L parameter.

root@DESKTOP-95TV8LK:/var/lib/docker/overlay2# tree -L 2
.
├── 809e4cfaa089d57ba81faea4570d6689cf6fe9a424b982ba6859b094340eef04
│ ├── committed
│ ├── diff
│ └── link
├── ab0963faec278aa7c9c40c79642774451b2a5ecd9142706d7b6165864d55ad59
│ ├── committed
│ ├── diff
│ ├── link
│ ├── lower
│ └── work
├── adfcb936bd0fac351f71721610abbec97b7309c1ae8323ebc6795c5c96ac0278
│ ├── diff
│ ├── link
│ ├── lower
│ └── work
└── l
    ├── 2FMRPFC5X2PFHGEZII4KC47JF4 -> ../ab0963faec278aa7c9c40c79642774451b2a5ecd9142706d7b6165864d55ad59/diff
    ├── EVRLLRGLJ5K5374Z6B32BREDLT -> ../809e4cfaa089d57ba81faea4570d6689cf6fe9a424b982ba6859b094340eef04/diff
    └── UXEF4DBCSIKECRM2J2IPAD4WY5 -> ../adfcb936bd0fac351f71721610abbec97b7309c1ae8323ebc6795c5c96ac0278/diff

12 directories, 7 files

We can successfully find them in /var/lib/docker/overlay2. Since some directories are relatively deep, we can specify the depth of tree access through the -L parameter.

root@DESKTOP-95TV8LK:/var/lib/docker/overlay2# tree -L 2
.
├── 809e4cfaa089d57ba81faea4570d6689cf6fe9a424b982ba6859b094340eef04
│ ├── committed
│ ├── diff
│ └── link
├── ab0963faec278aa7c9c40c79642774451b2a5ecd9142706d7b6165864d55ad59
│ ├── committed
│ ├── diff
│ ├── link
│ ├── lower
│ └── work
├── adfcb936bd0fac351f71721610abbec97b7309c1ae8323ebc6795c5c96ac0278
│ ├── diff
│ ├── link
│ ├── lower
│ └── work
└── l
    ├── 2FMRPFC5X2PFHGEZII4KC47JF4 -> ../ab0963faec278aa7c9c40c79642774451b2a5ecd9142706d7b6165864d55ad59/diff
    ├── EVRLLRGLJ5K5374Z6B32BREDLT -> ../809e4cfaa089d57ba81faea4570d6689cf6fe9a424b982ba6859b094340eef04/diff
    └── UXEF4DBCSIKECRM2J2IPAD4WY5 -> ../adfcb936bd0fac351f71721610abbec97b7309c1ae8323ebc6795c5c96ac0278/diff

12 directories, 7 files

But there is an extra l directory here, which stores soft links pointing to each layer, and the names of the soft links are obviously shortened. According to the docker official website, these soft links are used to avoid reaching the page size limit of the mount command for page parameters.

There are also link files in the directories of each layer, and these directories are the shortened soft link names in the l directory.

root@DESKTOP-95TV8LK:/var/lib/docker/overlay2# cat adfcb936bd0fac351f71721610abbec97b7309c1ae8323ebc6795c5c96ac0278/link
UXEF4DBCSIKECRM2J2IPAD4WY5

Use docker inspect to find information about the ubuntu image in GraphDriver.

"GraphDriver": {
            "Data": {
                "LowerDir": "/var/lib/docker/overlay2/ab0963faec278aa7c9c40c79642774451b2a5ecd9142706d7b6165864d55ad59/diff:/var/lib/docker/overlay2/809e4cfaa089d57ba81faea45 70d6689cf6fe9a424b982ba6859b094340eef04/diff",
                "MergedDir": "/var/lib/docker/overlay2/adfcb936bd0fac351f71721610abbec97b7309c1ae8323ebc6795c5c96ac0278/merged",
                "UpperDir": "/var/lib/docker/overlay2/adfcb936bd0fac351f71721610abbec97b7309c1ae8323ebc6795c5c96ac0278/diff",
                "WorkDir": "/var/lib/docker/overlay2/adfcb936bd0fac351f71721610abbec97b7309c1ae8323ebc6795c5c96ac0278/work"
            },
            "Name": "overlay2"
        }

From the information returned by inspect, we can see that the diff directory of each layer is the content of the layer that is “different” from the lower layer. For the lower image, it is a read-only layer (lowerdir), and for the upper layer, it is a read-write layer ( upperdir), they are also jointly mounted to mergedir with workdir.

The commit file not mentioned above is to record the relevant commit information of each layer.

Finally let’s try to create a container and the effect of commit on this directory.

nigo@DESKTOP-95TV8LK ~> docker run -itd --name myubuntu ubuntu
d6adc07566d205f1554b2db9534c76713f830a7705e9f41ab30f9c0f4d118b1d

root@DESKTOP-95TV8LK:/var/lib/docker/overlay2# tree -L 2
.
├── 6c287e2696d9a1593ae358045b511d95e6dc5f1bbe021a4f72a7892f3a8c5778
│ ├── diff
│ ├── link
│ ├── lower
│ ├── merged
│ └── work
├── 6c287e2696d9a1593ae358045b511d95e6dc5f1bbe021a4f72a7892f3a8c5778-init
│ ├── committed
│ ├── diff
│ ├── link
│ ├── lower
│ └── work
├── 809e4cfaa089d57ba81faea4570d6689cf6fe9a424b982ba6859b094340eef04
│ ├── committed
│ ├── diff
│ └── link
├── ab0963faec278aa7c9c40c79642774451b2a5ecd9142706d7b6165864d55ad59
│ ├── committed
│ ├── diff
│ ├── link
│ ├── lower
│ └── work
├── adfcb936bd0fac351f71721610abbec97b7309c1ae8323ebc6795c5c96ac0278
│ ├── committed
│ ├── diff
│ ├── link
│ ├── lower
│ └── work
└── l
    ├── 2FMRPFC5X2PFHGEZII4KC47JF4 -> ../ab0963faec278aa7c9c40c79642774451b2a5ecd9142706d7b6165864d55ad59/diff
    ├── EVRLLRGLJ5K5374Z6B32BREDLT -> ../809e4cfaa089d57ba81faea4570d6689cf6fe9a424b982ba6859b094340eef04/diff
    ├── M32S6F25IVIE4YRIR2BYQX5S4H -> ../6c287e2696d9a1593ae358045b511d95e6dc5f1bbe021a4f72a7892f3a8c5778-init/diff
    ├── RPNEL4XVFQMLG5Q4BF7BXUYY5C -> ../6c287e2696d9a1593ae358045b511d95e6dc5f1bbe021a4f72a7892f3a8c5778/diff
    └── UXEF4DBCSIKECRM2J2IPAD4WY5 -> ../adfcb936bd0fac351f71721610abbec97b7309c1ae8323ebc6795c5c96ac0278/diff

After running the container, two new layers are generated, one of which is the init layer, which is a read-only layer used to store content related to the container environment. Since these environments may be different on each machine, docker’s strategy is to place The init layer generates environment-related configurations when each image generates a container. We do not submit the contents of the init layer when docker commit.

Write a new file and execute docker commit to observe the result.

nigo@DESKTOP-95TV8LK ~> docker start myubuntu
myubuntu

nigo@DESKTOP-95TV8LK ~> docker exec -it myubuntu /bin/bash
root@d6adc07566d2:/# touch hello-overlay2.txt

root@d6adc07566d2:/# exit

root@DESKTOP-95TV8LK:/var/lib/docker/overlay2# tree -L 2
.
├── 0991cf894ea2ed9bb2b8313331ac9eb72c3678c26dc0152241e6228105df25f2
│ ├── diff
│ ├── link
│ ├── lower
│ └── work
├── 6c287e2696d9a1593ae358045b511d95e6dc5f1bbe021a4f72a7892f3a8c5778
│ ├── diff
│ ├── link
│ ├── lower
│ └── work
├── 6c287e2696d9a1593ae358045b511d95e6dc5f1bbe021a4f72a7892f3a8c5778-init
│ ├── committed
│ ├── diff
│ ├── link
│ ├── lower
│ └── work
├── 809e4cfaa089d57ba81faea4570d6689cf6fe9a424b982ba6859b094340eef04
│ ├── committed
│ ├── diff
│ └── link
├── ab0963faec278aa7c9c40c79642774451b2a5ecd9142706d7b6165864d55ad59
│ ├── committed
│ ├── diff
│ ├── link
│ ├── lower
│ └── work
├── adfcb936bd0fac351f71721610abbec97b7309c1ae8323ebc6795c5c96ac0278
│ ├── committed
│ ├── diff
│ ├── link
│ ├── lower
│ └── work
└── l
    ├── 2FMRPFC5X2PFHGEZII4KC47JF4 -> ../ab0963faec278aa7c9c40c79642774451b2a5ecd9142706d7b6165864d55ad59/diff
    ├── DD5NLJQIUJRFR45OMBJIF26CQ3 -> ../0991cf894ea2ed9bb2b8313331ac9eb72c3678c26dc0152241e6228105df25f2/diff
    ├── EVRLLRGLJ5K5374Z6B32BREDLT -> ../809e4cfaa089d57ba81faea4570d6689cf6fe9a424b982ba6859b094340eef04/diff
    ├── M32S6F25IVIE4YRIR2BYQX5S4H -> ../6c287e2696d9a1593ae358045b511d95e6dc5f1bbe021a4f72a7892f3a8c5778-init/diff
    ├── RPNEL4XVFQMLG5Q4BF7BXUYY5C -> ../6c287e2696d9a1593ae358045b511d95e6dc5f1bbe021a4f72a7892f3a8c5778/diff
    └── UXEF4DBCSIKECRM2J2IPAD4WY5 -> ../adfcb936bd0fac351f71721610abbec97b7309c1ae8323ebc6795c5c96ac0278/diff

24 directories, 15 files

Sure enough, a new mirror layer was generated!

nigo@DESKTOP-95TV8LK ~> mount | grep overlay
overlay on /var/lib/docker/overlay2/6c287e2696d9a1593ae358045b511d95e6dc5f1bbe021a4f72a7892f3a8c5778/merged type overlay (rw,relatime,lowerdir=/var/lib/docker/overlay2/l/M32S6F2 5IVIE4YRIR2BYQX5S4H:/var/lib/docker/overlay2/l/UXEF4DBCSIKECRM2J2IPAD4WY5:/ var/lib/docker/overlay2/l/2FMRPFC5X2PFHGEZII4KC47JF4:/var/lib/docker/overlay2/l/EVRLLRGLJ5K5374Z6B32BREDLT,upperdir=/var/lib/docker/overlay2/6c287e2696d9a1593ae358045b511d95e6 dc5f1bbe021a4f72a7892f3a8c5778/diff,workdir=/var/lib/docker/overlay2 /6c287e2696d9a1593ae358045b511d95e6dc5f1bbe021a4f72a7892f3a8c5778/work)

And these jointly mounted directories are also shortened soft links in the l directory.

overlay and overlay2

Due to kernel support, I have not done specific experiments on this point. The following is the conclusion of hearsay:

difference:
The lowdir of the overlay has only one layer, and each read-only layer shares files through hard links, so each read-only layer has a complete set of increments.
The read-only layer of overlay2 is an independent entity, which is uniformly mounted to merged when the container starts.
Same point:
There is a work directory for copy-on-write and other work, and it is mounted to the merged directory at startup.
Both support page caching, and containers of the same image have the opportunity to use the same file, and the memory consumption is smaller.
For details, please refer to: the difference between overlay and overlay2 – ElNinoT – 博客园
Before the experiment, please modify the Storage Driver of docker in the same way.

ref: https://blog.csdn.net/qq_45858169/article/details/115918469

Too many files for overlay2

Can be used to clean up disks, delete closed containers, useless data volumes and networks, and dangling mirrors (that is, mirrors without tags) by executing the docker system prune command