There are so many Dockerfiles, which one is the real best practice?

The syntax of Dockerfile is very simple. However, how to speed up the image construction and how to reduce the size of the Docker image is not so intuitive and requires accumulation of practical experience. This article can help you quickly master the skills of writing Dockerfile.

Goals

  • Faster build speed

  • Smaller Docker image size

  • Fewer Docker image layers

  • Take advantage of image caching

  • Improve Dockerfile readability

  • Make Docker containers easier to use

Summary

  • Write the .dockerignore file

  • Containers only run a single application

  • Combine multiple RUN instructions into one

  • Do not use latest for the label of the base image.

  • Delete redundant files after each RUN instruction

  • Choose an appropriate base image (alpine version is best)

  • Set WORKDIR and CMD

  • Use ENTRYPOINT (optional)

  • Using exec in entrypoint script

  • COPY and ADD give priority to the former.

  • Properly adjust the order of COPY and RUN

  • Set default environment variables, map ports and data volumes

  • Use LABEL to set image metadata

  • ADD HEALTHCHECK

  • Multi-stage build

FROM ubuntu</code><code>ADD ./app</code><code>RUN apt-get update </code><code>RUN apt-get upgrade -y </code><code>RUN apt-get install -y nodejs ssh mysql </code><code>RUN cd /app & amp; & amp; npm install</code><code># this should start three processes, mysql and ssh</code><code># in the background and node app in foreground</code><code># isn't it beautifully terrible? <3</code><code>CMD mysql & amp; sshd & amp; npm start</pre >
<p>Build the image:</p>
<pre>docker build -t wtf .

Can you find all the errors in the Dockerfile above? No? Then let’s improve it step by step.

Optimization

1. Write .dockerignore file

.git/node_modules/

2. Containers only run a single application

Technically speaking, you can run multiple processes in a Docker container. You can run the database, front-end, back-end, ssh, and supervisor in the same Docker container. However, this will cause you great pain:

  • Very long build times (after modifying the frontend, the entire backend also needs to be rebuilt)

  • Very large image size

  • The logs of multiple applications are difficult to process (stdout cannot be used directly, otherwise the logs of multiple applications will be mixed together)

  • It is very wasteful of resources when scaling horizontally (different applications need to run different numbers of containers)

  • Zombie process problem – you need to choose the appropriate init process

Therefore, it is recommended that you build a separate Docker image for each application, and then use Docker Compose to run multiple Docker containers.

Now, I remove some unnecessary installation packages from the Dockerfile. Additionally, SSH can be replaced with docker exec. Examples are as follows:?

FROM ubuntu</code><code>ADD . /app</code><code>RUN apt-get update </code><code>RUN apt-get upgrade -y</code><code># we should remove ssh and mysql, and use </code><code># separate container for database </code><code>RUN apt-get install -y nodejs # ssh mysql </code><code>RUN cd /app & amp; & amp; npm install</code><code>CMD npm start

3. Combine multiple RUN instructions into one

Docker images are layered, and the following knowledge points are very important:

  • Each instruction in the Dockerfile creates a new image layer.

  • Image layers will be cached and reused

  • When the Dockerfile instructions are modified, the copied files change, or the variables specified when building the image are different, the corresponding image layer cache will become invalid.

  • After the image cache of a certain layer becomes invalid, the cache of the subsequent image layers will become invalid.

  • Image layers are immutable. If we add a file to a certain layer and then delete it in the next layer, the file will still be included in the image (it is just that the file will not be visible in the Docker container).

Docker images are similar to onions. They all have many layers. In order to modify the inner layer, you need to delete the outer layers. If you remember this, the rest will be easy to understand.

Now, wecombine all RUN instructions into one. Also delete apt-get upgrade, because it will make the image build very uncertain (we only need to rely on the update of the base image)?

FROM ubuntu</code><code>ADD . /app</code><code>RUN apt-get update \ </code><code> & amp; & amp; apt-get install -y nodejs \</code><code> & amp; & amp; cd /app \</code><code> & amp; & amp; npm install</code><code>CMD npm start

Remember, we can only merge instructions together that change with the same frequency. If you put the node.js installation and the npm module installation together, you will need to reinstall node.js every time you modify the source code, which is obviously inappropriate. Therefore, the correct way to write it is this: ?

FROM ubuntu</code><code>RUN apt-get update & amp; & amp; apt-get install -y nodejs </code><code>ADD ./app </code><code>RUN cd /app & amp; & amp; npm install</code><code>CMD npm start

4. Do not use latest in the tag of the base image

When the image does not specify a label, the latest label will be used by default. Therefore, the FROM ubuntu instruction is equivalent to FROM ubuntu:latest. At that time, when the image was updated, the latest tag would point to a different image, and building the image might fail. If you really need to use the latest version of the base image, you can use the latest tag. Otherwise, it is best to specify a certain image tag.

The example Dockerfile should use 16.04 as the label. ?

FROM ubuntu:16.04 # it's that easy!</code><code>RUN apt-get update & amp; & amp; apt-get install -y nodejs </code><code>ADD . / app </code><code>RUN cd /app & amp; & amp; npm install</code><code>CMD npm start

5. Delete redundant files after each RUN command

Suppose we updated the apt-get source, downloaded, unzipped and installed some packages, they are all saved in the /var/lib/apt/lists/ directory. However, these files are not required in the Docker image when running the application. We’d better remove them as it will make the Docker image larger.

In the example Dockerfile, we can delete the files in the /var/lib/apt/lists/ directory (they are generated by apt-get update). ?

FROM ubuntu:16.04</code><code>RUN apt-get update \ </code><code> & amp; & amp; apt-get install -y nodejs \</code><code> # added lines</code><code> & amp; & amp; rm -rf /var/lib/apt/lists/*</code><code>ADD . /app </code><code>RUN cd / app & amp; & amp; npm install</code><code>CMD npm start

6. Choose the appropriate base image (alpine version is best)

In the example, we selected ubuntu as the base image. But we only need to run the node program, is it necessary to use a common base image? node mirroring should be a better choice. ?

FROM node</code><code>ADD . /app </code><code># we don't need to install node </code><code># anymore and use apt-get</code><code>RUN cd /app & amp; & amp; npm install</code><code>CMD npm start

A better choice is the alpine version of the node image. alpine is a minimal Linux distribution, only 4MB, which makes it very suitable as a base image. ?

FROM node:7-alpine</code><code>ADD . /app </code><code>RUN cd /app & amp; & amp; npm install</code><code>CMD npm start< /pre>
<p>apk is Alpine's package management tool. It's a little different from apt-get, but very easy to get started with. In addition, it has some very useful features, such as no-cache and --virtual options, which can help us reduce the size of the image.</p>
<h4>7. Set WORKDIR and CMD</h4>
<p>The WORKDIR instruction can set the default directory, which is where the RUN / CMD / ENTRYPOINT instructions are run.</p>
<p>The CMD command can set the default command to be executed when creating a container. In addition, you should write the command in an array, and each element in the array is each word of the command (refer to the official documentation). ?</p>
<pre>FROM node:7-alpine</code><code>WORKDIR /app </code><code>ADD ./app </code><code>RUN npm install</code><code>CMD [\ "npm", "start"]

8. Use ENTRYPOINT (optional)

The ENTRYPOINT directive is not required as it adds complexity. ENTRYPOINT is a script that will be executed by default and receive the specified command as a parameter. It is commonly used to build executable Docker images. entrypoint.sh is as follows:

#!/usr/bin/env sh_# $0 is a script name, # 2, $3 etc are passed arguments# 1case "$CMD" in "dev" ) npm install export NODE_ENV=development exec npm run dev ;; "start" ) _# we can modify files here, using ENV variables passed in _ # "docker create" command. It can't be done during build process. echo "db: $ DATABASE_ADDRESS" >> /app/config.yml export NODE_ENV=production exec npm start ;; * ) _# Run custom command. Thanks to this line we can still use _ # "docker run our_image /bin/bash" and it will work exec {@:2} ;;esac

Example Dockerfile: ?

FROM node:7-alpine</code><code>WORKDIR /app </code><code>ADD ./app </code><code>RUN npm install</code><code>ENTRYPOINT [\ "./entrypoint.sh"] </code><code>CMD ["start"]

You can run the image using the following command:

_# Run the development version_docker run our-app dev _# Run the production version_docker run our-app start _# Run bash_docker run -it our-app /bin/bash

9. Use exec in entrypoint script

In the previous entrypoint script, I used the exec command to run the node application. Without using exec, we cannot shut down the container smoothly because the SIGTERM signal will be swallowed by the bash script process. The process started by the exec command can replace the script process, so all signals will work normally.

Here is an extended introduction to the stopping process of the docker container:

(1). For containers, the init system is not necessary. When you stop the container through the command docker stop mycontainer, the docker CLI will send the TERM signal to mycontainer. The process with PID 1.

  • If PID 1 is the init process – then PID 1 will forward the TERM signal to the child process, then the child process will start to shut down, and finally the container will terminate.

  • If there is no init process – then the application process in the container (the application specified by ENTRYPOINT in the Dockerfile or CMD) is PID 1, and the application process is directly responsible for responding to the TERM signal.

    This is divided into two situations:

    • Application does not handle SIGTERM – If the application does not listen for the SIGTERM signal, or the application does not implement logic to handle the SIGTERM signal, the application will not stop and the container will not terminate.

    • The container stops for a long time – After running the command docker stop mycontainer, Docker will wait 10s. If the container stops after 10s Before it is terminated, Docker will bypass the container application and send SIGKILL directly to the kernel. The kernel will forcibly kill the application and terminate the container.

(2). If the process in the container does not receive the SIGTERM signal, it is most likely because the application process is not PID 1, PID 1 is a shell, and the application process is just a child process of the shell. The shell does not have the function of the init system, so it will not forward operating system signals to the child process. This is also a common reason why applications in the container do not receive the SIGTERM signal.

The root of the problem comes from Dockerfile, for example: ?

FROM alpine:3.7</code><code>COPY popcorn.sh .</code><code>RUN chmod + x popcorn.sh</code><code>ENTRYPOINT ./popcorn.sh</code> <code>CMD ["start"]

The ENTRYPOINT instruction uses shell mode, so Docker will run the application in the shell, so the shell is PID 1.

The solutions are as follows:

Option 1: Use the ENTRYPOINT instruction in exec mode

Instead of using shell mode, use exec mode, for example: ?

FROM alpine:3.7</code><code>COPY popcorn.sh .</code><code>RUN chmod + x popcorn.sh</code><code>ENTRYPOINT ["./popcorn.sh\ "]

In this way, PID 1 is ./popcorn.sh, which will be responsible for responding to all signals sent to the container. As for whether ./popcorn.sh can really capture system signals, That’s another thing.

For example, assuming the above Dockerfile is used to build the image, the popcorn.sh script prints the date every second: ?

#!/bin/sh</code>
<code>while true</code><code>do</code><code> date</code><code> sleep 1</code><code>done

Build the image and create the container: ?

docker build -t truek8s/popcorn .</code><code>docker run -it --name corny --rm truek8s/popcorn

Open another terminal and execute the command to stop the container, and time:

time docker stop corny

Because popcorn.sh does not implement the logic of capturing and processing the SIGTERM signal, it takes about 10 seconds to stop the container. To solve this problem, you need to add signal processing code to the script so that it terminates the process when it catches the SIGTERM signal: ?

#!/bin/sh</code><code># catch the TERM signal and then exit</code><code>trap "exit" TERM</code><code>while true</code><code>do</code><code> date</code><code> sleep 1</code><code>done

Note: The following instruction is equivalent to the shell mode ENTRYPOINT instruction:

ENTRYPOINT ["/bin/sh", "./popcorn.sh"]
Option 2: Use the exec command directly

If you just want to use the ENTRYPOINT instruction in shell mode, it is not impossible. Just append the startup command to exec, for example: ?

FROM alpine:3.7</code><code>COPY popcorn.sh .</code><code>RUN chmod + x popcorn.sh</code><code>ENTRYPOINT exec ./popcorn.sh</pre >
<h5>Option 3: Use the init system</h5>
<p>If the application in the container cannot handle the <code>SIGTERM</code> signal by default and cannot modify the code, then options 1 and 2 will not work. You can only add an <code>init</code> system to the container. . There are many kinds of init systems. Tini is recommended here. It is a lightweight init system dedicated to containers. The method of use is also very simple:</p>
<ol><li> <p>Install <code>tini</code></p></li><li> <p>Set <code>tini</code> as the container's default application</p></li><li> <p>Use <code>popcorn.sh</code> as a parameter to <code>tini</code></p></li></ol>
<p>The specific Dockerfile is as follows:?</p>
<pre>FROM alpine:3.7</code><code>COPY popcorn.sh .</code><code>RUN chmod + x popcorn.sh</code><code>RUN apk add --no-cache tini</code><code>ENTRYPOINT ["/sbin/tini", "--", "./popcorn.sh"]

Now?

tini is PID 1, which will forward the received system signal to the child process </code><code>popcorn.sh

10. Use COPY and ADD in preference to the former

The COPY command is very simple and is only used to copy files to the image. ADD is relatively complex and can be used to download remote files and decompress compressed packages (refer to official documentation). ?

FROM node:7-alpine</code><code>WORKDIR /app</code><code>COPY . /app </code><code>RUN npm install</code><code>ENTRYPOINT [\ "./entrypoint.sh"] </code><code>CMD ["start"]

11. Reasonably adjust the order of COPY and RUN

We should put the least changed part at the front of the Dockerfile so that we can make full use of the image cache.

When building an image, docker will execute it in sequence according to the instructions in the dockerfile. When each instruction is executed, docker will check the cache to see if there is an existing image that can be reused, instead of creating a new image copy.

If you do not want to use the build cache, you can use the docker build parameter option -no-cache=true to disable the build cache. When using image caching, you need to figure out when the cache should take effect and when it will expire. The most basic rules for building a cache are as follows:

  • If the referenced parent image is in the build cache, the next command will be compared with all child images derived from the parent process. If there are child images using the same command, the cache is hit, otherwise the cache is invalid.

  • In most cases, comparing the instructions in the Dockerfile to the subimage is sufficient.

    But some instructions require further inspection.

  • For the ADD and COPY instructions, the file contents are checked and a checksum is calculated for each file.

    However, the last modification and access time of the file are not considered in the check code.

    During the build process, docker will compare existing images. As long as the file content and metadata change, the cache will become invalid.

  • Except for the ADD and COPY directives, the image cache does not check the files in the container to determine whether the cache is hit. For example, when processing the RUN apt-get -y update command, the update file in the container is not checked to determine whether the cache is hit. In this case, only the command string is checked to see if it is the same.

In the example, the source code changes frequently, so the NPM module needs to be reinstalled every time the image is built, which is obviously not what we want to see. So we can copy package.json first, then install the NPM module, and finally copy the rest of the source code. In this case, even if the source code changes, there is no need to reinstall the NPM module. ?

FROM node:7-alpine</code><code>WORKDIR /app</code><code>COPY package.json /app </code><code>RUN npm install </code><code>COPY . /app</code><code>ENTRYPOINT ["./entrypoint.sh"] </code><code>CMD ["start"]

Similarly, when working on Python projects, we can also copy requerements.txt first, then pip install requerements.txt, and finally COPY the code.

ROM python:3.6</code><code># Create app directory</code><code>WORKDIR /app</code><code># Install app dependencies</code><code>COPY src/requirements .txt ./</code><code>RUN pip install -r requirements.txt</code><code># Package app source code</code><code>COPY src /app</code><code>EXPOSE 8080 </code><code>CMD [ "python", "server.py" ]

12. Set default environment variables, map ports and data volumes

You will most likely need some environment variables when running a Docker container. Setting default environment variables in the Dockerfile is a good way. Additionally, we should set the mapped port and data volume in the Dockerfile. Examples are as follows:?

dockerfile FROM node:7-alpine ENV PROJECT_DIR=/app WORKDIR</code>
<code>PROJECT_DIR RUN npm install COPY .</code>
<code>MEDIA_DIR EXPOSE $APP_PORT ENTRYPOINT ["./entrypoint.sh"] CMD ["start"] ``` [ENV](https://docs.docker.com/engine/reference/builder/ The environment variables specified by the #env) directive can be used in the container. If you just need to specify variables when building the image, you can use the [ARG](https://docs.docker.com/engine/reference/builder/#arg) directive. 

13. Use LABEL to set image metadata

Using the LABEL directive, you can set metadata for the image, such as the image creator or image description. The old version of Dockerfile syntax used the MAINTAINER directive to specify the image creator, but it has been deprecated. Sometimes, some external programs need to use the image metadata, for example, nvidia-docker needs to use com.nvidia.volumes.needed. Examples are as follows:?

FROM node:7-alpine </code><code>LABEL maintainer "[email protected]" </code><code>...

14. Add HEALTHCHECK

When running a container, you can specify the –restart always option. In this case, when the container crashes, the Docker daemon will restart the container. This option is useful for long-running containers. However, what if the container is indeed running, but is unavailable (trapped in an infinite loop, misconfigured)? Using the HEALTHCHECK directive allows Docker to periodically check the health of the container. We just need to specify a command that returns 0 if everything is OK and 1 otherwise. If you are interested in HEALTHCHECK, you can refer to this blog. Examples are as follows:?

FROM node:7-alpine </code><code>LABEL maintainer "[email protected]"</code><code>ENV PROJECT_DIR=/app </code><code>WORKDIR $ PROJECT_DIR</code><code>COPY package.json $PROJECT_DIR </code><code>RUN npm install </code><code>COPY . $PROJECT_DIR</code><code>ENV MEDIA_DIR=/media \ </code><code> NODE_ENV=production \</code><code> APP_PORT=3000</code><code>VOLUME $MEDIA_DIR </code><code>EXPOSE $APP_PORT </code><code>HEALTHCHECK CMD curl --fail http://localhost:$APP_PORT || exit 1</code><code>ENTRYPOINT ["./entrypoint.sh"] </code><code>CMD ["start"] 

When a request fails, the curl –fail command returns a non-zero status.

15. Multi-stage construction

Reference document “https://docs.docker.com/develop/develop-images/multistage-build/

In the era when docker did not support multi-stage construction, we usually used the following two methods when building docker images:

Method A. Write all the build processes in the same Dockerfile, including the compilation, testing, packaging and other processes of the project and its dependent libraries. The following problems may occur:

  • Dockerfile can be particularly bloated

  • The mirror level is particularly deep

  • There is a risk of source code leakage

Method B. After compiling and testing the project and its dependent libraries externally in advance, copy it to the build directory and execute the build image.

Method B is slightly more elegant than method A, and can well avoid the risks of method A, but it still requires us to write two or more sets of Dockerfiles or some scripts to automatically integrate the two stages. For example, how many Projects are related and dependent on each other, which requires us to maintain multiple Dockerfiles or write more complex scripts, resulting in high later maintenance costs.

To solve the above problems, Docker v17.05 begins to support multistage builds. Using multi-stage builds we can easily solve the previously mentioned problem and only need to write a Dockerfile.

You can use multiple FROM statements in a Dockerfile. Each FROM instruction can use a different base image and indicates the start of a new build phase. You can easily copy files from one stage to another and keep the content you need in the final image.

By default, build phases have no commands. We can refer to them by their index. The first FROM directive starts from 0. We can also name the build phase with the AS directive.

Case 1?
FROM golang:1.7.3</code><code>WORKDIR /go/src/github.com/alexellis/href-counter/</code><code>RUN go get -d -v golang.org/ x/net/html</code><code>COPY app.go .</code><code>RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o app .</code>
<code>FROM alpine:latest</code><code>RUN apk --no-cache add ca-certificates</code><code>WORKDIR /root/</code><code>COPY --from=0 / go/src/github.com/alexellis/href-counter/app .</code><code>CMD ["./app"]

After building via docker build, the end result is an Image of the same size as before, but with significantly less complexity. You do not need to create any intermediate images, nor do you need to temporarily extract any compilation results to the local system.

How does it work? The key lies in the COPY --from=0 command. The second FROM instruction in the Dockerfile starts a new build phase with alpine:latest as the base image, and only copies the build files of the previous phase to this phase through COPY --from=0. The Go SDK and any intermediate layers produced in the previous build phase are discarded in this phase instead of being saved in the final Image.

Build a python application using multiple stages.

Case 2

By default, build phases are unnamed. You can reference them by an integer value, starting at the 0th FROM instruction by default. For easier management, you can also name your build phases by adding as NAME to the FROM directive. The following example accesses specific build phases by naming them and using the names in the COPY directive.

The advantage of this is that even if you reorder the instructions in the Dockerfile later, the COPY instruction can still find the corresponding build phase. ?

FROM golang:1.7.3 as builder</code><code>WORKDIR /go/src/github.com/alexellis/href-counter/</code><code>RUN go get -d -v golang. org/x/net/html</code><code>COPY app.go .</code><code>RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o app .</code>
<code>FROM alpine:latest</code><code>RUN apk --no-cache add ca-certificates</code><code>WORKDIR /root/</code><code>COPY --from=builder / go/src/github.com/alexellis/href-counter/app .</code><code>CMD ["./app"]
Case 3

Stop at a specific build phase

When building an image, you don’t necessarily need to build every stage in the entire Dockerfile. You can also specify the stages that need to be built. For example: you only build the stage named builder in the Dockerfile

$ docker build --target builder -t alexellis2/href-counter:latest .

This function is suitable for the following scenarios:

  • Debug specific build phases.

  • During the Debug phase, enable all program debugging modes or debugging tools, and keep it as streamlined as possible during the Production phase.

  • During the Testing phase, your application uses test data, but during the Production phase, it uses production data.

Case 4

Use external image as build phase

When using a multi-stage build, you can not only copy from the image created in the Dockerfile. You can also copy from a separate Image using the COPY --from directive, supporting the use of local Image names, tags or tag IDs available locally or in the Docker registry.

COPY --from=nginx:latest /etc/nginx/nginx.conf /nginx.conf
Case 5

Treat the previous stage as a new stage

When using the FROM instruction, you can continue by referencing where the previous phase left off. Similarly, using this method can also facilitate different roles in a team. How to use a pipeline-like method to provide basic images level by level. It is also more convenient and fast to reuse the basic images of other people in the team. For example:?

FROM alpine:latest as builder</code><code>RUN apk --no-cache add build-base</code><code>FROM builder as build1</code><code>COPY source1.cpp source .cpp</code><code>RUN g + + -o /binary source.cpp</code><code>FROM builder as build2</code><code>COPY source2.cpp source.cpp</code><code>RUN g + + -o /binary source.cpp</code>
# ----Basic python image----</code><code>FROM python:3.6 AS base</code><code># Create app directory</code><code>WORKDIR /app</code><code># ---- Dependencies----</code><code>FROM base AS dependencies </code><code>COPY gunicorn_app/requirements.txt ./</code><code># Install app dependencies</code><code>RUN pip install -r requirements.txt</code><code># ---- Copy the file and build ----</code><code>FROM dependencies AS build </code><code>WORKDIR /app</code><code>COPY . /app</code><code># Build or Compile when needed</code><code># --- Release using Alpine- ---</code><code>FROM python:3.6-alpine3.7 AS release </code><code># Create app directory</code><code>WORKDIR /app</code><code>COPY - -from=dependencies /app/requirements.txt ./</code><code>COPY --from=dependencies /root/.cache /root/.cache</code><code># Install app dependencies</code> <code>RUN pip install -r requirements.txt</code><code>COPY --from=build /app/ ./</code><code>CMD ["gunicorn", "--config\ ", "./gunicorn_app/conf/gunicorn_config.py", "gunicorn_app:app"]

Source: This article is reproduced from the public account operation and development story