This series of tutorials will introduce the common toolchains for building and compiling modern C/C++ projects, GCC, Make, and CMake. Among them, GCC is a compilation tool of C/C++ language, Make is an incremental (compilation) batch processing tool, and CMake is a Make script generation tool. In the build of a modern C/C++ project, their relationship is as follows.
cmake make gcc CMakelist.txt -----> Makefile ----> Cmds ---> Binary
Developers need to write a CMakelist.txt
file to configure project-related CMake parameters. By running the cmake
command, the Make tool of the corresponding platform is automatically generated to automatically build the script Makefile
file. Of course, CMake also supports generating configuration files for other build tools, such as Xcode’s xxxx.xcodeproj
, Visual Studio’s xxxx.sln
, Ninja xxxx.ninja
and so on. At present, most open source C/C ++ projects support using CMake to generate Makefile
files, and then call the make
command to use the Make tool for automatic construction. The Makefile
file can be regarded as a series of shell commands that depend on the file. It implements incremental processing based on time stamps of file modifications. The specific rules are roughly as follows. If the timestamp of the generated target file is earlier than the timestamp of the dependent file, execute the corresponding command and regenerate > object file. This actually implies that the Make tool is not only used for compilation, but also for other incremental file generation tasks. When using the Make tool to compile a C/C ++ project, the shell command is generally used to call gcc
, automatic and incremental implementation A series of work such as compiling and linking of C/C ++ source code.
Introduction to GCC
In the early days, GCC was short for GNU C Compiler, which is the C language compiler in the GNU project. But after years of expansion and iteration, GCC gradually supports compilation of more and more languages such as C, C++, Objective-C, Fortran, Java, Ada, and Go. Therefore, its GCC was redefined as GNU Compiler Collection, the GNU Compiler Suite. In this article, we only introduce the use of GCC to compile C/C++ projects.
It is worth noting that Apple has been using GCC as the official compiler. However, because the GCC development community has always given low priority to Apple’s needs, many of Apple’s important needs are basically not considered. As a result, the rich Apple decided to abandon GCC in a rage, and redeveloped the compilation tool Clang based on LLVM to support C, C++, Objective-C and other languages. Therefore, the current default
gcc
command on macOS actually callsclang
. If you want to use GCC on macOS, you need to install it yourself, such as using Homebrew, a commonly used package management tool on macOS (brew install gcc
). Fortunately,clang
is basically a replica ofgcc
in terms of usage (calling methods, parameters, etc.). Therefore, in this article, although the author talks about GCC, the actual examples given are all using Clang. As far as the content introduced in this article is concerned, the difference between the two is not big, and basically it will not cause too many problems.
Compilation process
When using gcc
to compile a C/C ++ program, the main compilation process is as follows, including preprocessing, compilation, assembly strong>, link and other four steps. Taking the input of the C language program source code file b.c
as an example, directly calling the command gcc b.c
will completely execute the following process and generate the corresponding a.out
. Note that the default output of gcc
here is the fixed a.out
. In the GCC tool chain, the assembly is done by the tool as
, and the link is done by the tool ld
.
-E -S -c b.c ------> b.i ------> b.s ------> b.o ------> a.out gcc gcc as ld
Using the following commands for gcc
will make its compilation process stop at the corresponding position:
-E
, (prEprocessing), execute after the preprocessing step, that is, process# in the C/C++ source code
, including macro expansion and#include
header file import and so on. This command does not output files by default, you can use the-o
command to output files with the suffix*.i
.-S
, (aSsembly), after the compile step is executed, an assembly file is generated, but binary machine code is not generated. The default output file suffix of this command is*.s
.-c
, (compilation), after executing the assembly step, call the toolas
to generate from the assembly code Binary machine code, but not linked. The default output file suffix of this command is*.o
(object).- Calling
gcc
without the above parameters will completely execute the above process, that is, after the linking (linking) step. The linking step actually calls the linking toolld
to execute, which will combine the binary files generated by the source code, library files, and the startup part of the program to form a complete binary executable file.
In particular, using the command -o
, (ooutput), you can specify the name of the output file. For example, gcc b.c -o b.bin
will generate the executable file b.bin
instead of the default a.out
.
The above instructions can be called on the basis of any link in the compilation process, for example:
> gcc -E b.c -o b.i > ls b.c b.i > gcc -S b.i b.c b.i b.s > gcc -c b.s b.c b.i b.o b.s > gcc b.o b.c b.i b.o a.out b.s
Package Management
We will introduce the main compilation parameters of GCC later, but before that, the core content I hope to introduce is “how to use GCC compilation parameters from the perspective of package management”. This is also what the author most hopes to introduce to readers. Therefore, this section will first introduce the package management methods of C/C++ projects.
For an actual C/C ++ project, generally there is not only one source file, and third-party library (Third-party Library) will be used in most cases. Since C/C++ does not have an official package management tool (Package Manager), such as Python’s pip
, Java’s maven
, Nodejs’ npm
and so on, so when using a third-party library in a C/C ++ project, generally use the package manager that comes with the system to install the third-party library, such as under Ubuntu apt-get
, brew
(Homebrew) for macOS, etc. For third-party libraries not included in the system package manager, we generally choose to compile and install them ourselves, or compile them together as sub-projects.
The third-party library is mainly composed of two parts, namely a) header file, b) library file. Header files are generally a series of files named xxx.h
(head), which is equivalent to exposing the API interface (function signature) provided by the third-party library. Library files generally include Static library files and Dynamic library files, which are equivalent to the binary implementation of functions of third-party libraries. Among them, the static library file is a series of files named libxxx.a
(archive) (libxxx.lib
under Windows, library). The dynamic library file is a series of files named libxxx.so
(shared object) (libxxx. dll
, dynamic link library, under macOS is libxxx.dylib
, dynamic library). The header files of the system’s own and third-party libraries installed by the system package manager are generally in the /usr/include
or /usr/local/include
path. The library files are generally in the /lib
, /usr/lib
and /usr/local/lib
directories.
Due to the influence of the above factors, the GCC tool chain is not responsible for managing third-party libraries, so it is impossible to determine which libraries need to be used in a C/C++ project, and the accurate information of these libraries, such as location, version, etc. Therefore, only using GCC cannot completely and automatically solve the dependency problem of the third-party library of the C/C ++ project. That is to say, languages such as Python and Java cannot, only need to use the import xxx
statement to import the corresponding package, and the package manager of the language can automatically resolve the dependencies of the third-party library. After the C/C++ language uses the #include "xxx"
statement, we need to manually add various compilation parameters, such as -I
, -l
and -L
, pass the relevant information of the third-party library it depends on to the gcc
compiler. Among them, -I
passes the “directory where the header file is located”, -l
passes the “name of the library” that needs to be linked, -L
code> passes the “directory where the library file is located”. These three parameters are particularly important, and I hope readers will keep them in mind.
Compile parameters
-I
parameter
Looking back at the GCC compilation process introduced earlier, the #include
directive needs to be processed in the preprocessing stage to replace the included header files into the source code. Generally speaking, during preprocessing, gcc
will automatically search for corresponding header files in the current project directory and /usr/include
directory.
However, for the header files of third-party libraries located in other directories, gcc
cannot automatically find the location of the required header files, and will report something like xxx.h: file not found
code> error. We need to use the -I
parameter to specify the location of the third-party library header file. For example, under macOS, using the Homebrew package manager to install llvm
will install the third-party library contained in the LLVM project accordingly, and its corresponding header files are located in /usr/local/opt/ llvm/include
directory.
When we use the library provided by LLVM, we can use -I/usr/local/opt/llvm/include
(or -I /usr/local/opt/llvm/include
code>, plus a space) to specify the location of the header file. Thus, gcc
will additionally search for the corresponding header files in the directory specified by the -I
parameter. The -I
parameter can be used repeatedly to specify multiple additional header file directories. The -I
parameter generally specifies an absolute path, but you can also use a relative path. For example, if the header file is in the current directory, you can use -I .
to specify.
It should be noted that in the C/C ++ source code, when the
#include "xxxx.h"
statement is used, thexxxx.h
can carry the path. We can even refer to header files using absolute paths. For example, there is a header file/usr/local/opt/llvm/include/llvm/Pass.h
, when we use it, we can directly refer to it in this way#include"/ usr/local/opt/llvm/include/llvm/Pass.h"
.However, in C/C++ projects, this practice is not recommended. The recommended method is to use the method of relative path plus parameter
-I include_dir
to refer to the header file. For example, in the above example, we will directly use#include "llvm/Pass.h"
in the source code, and pass the parameter-I /usr/ to the directory where the header file of the llvm library is located. local/opt/llvm/include
is passed togcc
. This can flexibly manage third-party library versions, and it is also convenient for multi-person collaborative development under different machines, which is much better than directly including absolute path header files.All in all, when
gcc
performs preprocessing, the library file directory (such as the directory passed in by the-I
parameter, and the default/usr/include
code>,/usr/local/include
and other directories), and thexxxx.h
of the#include "xxxx.h"
statement in the program source code Do combined splicing. If there is an actual header file in the resulting path for a certain combination, the header file will be included.
-l
parameter
During the link phase of the GCC compilation process, the standard library, such as libc.a
, will be linked by default, but for third-party libraries, it needs to be added manually. If the following error is reported during compilation: Undefined symbols for architecture x86_64: xxx...xxx ld: symbol(s) not found for architecture x86_64
This is usually caused by not correctly specified< /strong>It is caused by the third-party library that needs to be linked.
When using gcc
, you generally choose to use the -l
parameter to specify the library that needs to be linked. For example, suppose we use the math
library (namely #include
), when compiling, it will report the above Undefined
code> error. At this time, we can use the -lm
(or -l m
) parameter to specify that the math
library needs to be linked.
Note that some gcc compilers will automatically link the
math
library as a standard library. At this time, we need to add the-nostdlib
parameter to make it not automatically link the standard library, so that the aboveUndefined
error will be reported.
At first glance at the -lm
parameter, it may feel a little weird. So, how is the -l
parameter used? The -l
parameter needs to be followed by the library name (such as m
), not the library file name (such as libm.so
). But there is a very intuitive connection between the library name and the library file name. Take the math
library as an example, the library file name is libm.so
, and the library name is m
. It is easy to see that the library name is obtained by removing the prefix lib
and the suffix .so
of the library file name. For another example, the library file libLLVMCore.a
included in LLVM, the corresponding library name is LLVMCore
, and the parameter to link it is -lLLVMCore
.
-L
parameter
Library files located in /lib
, /usr/lib
, /usr/local/lib
and other directories, such as libm.so
, after using the -l
parameter, it can be linked directly. But if the library files are not in these directories, only use the -l
parameter, and an error will still be reported when linking, ld: library not found for -lxxx
. This means that the linker ld
cannot find libxxx.so
or libxxx.a
in the current library path.
At this time, we need to use the -L
parameter to tell gcc
the path of the library file to be linked. The -L
parameter needs to be followed by the path where the library file is located. For example, under macOS, use the Homebrew package manager to install llvm
, and its corresponding library files are located in the /usr/local/opt/llvm/lib
directory. If we need to use the library LLVMCore
, that is, the link library file libLLVMCore.a
, in addition to adding the -lLLVMCore
parameter, we also need to use the parameter -L/usr/local/opt/llvm/lib
, tell gcc
the directory where the library file is located.
Other compilation parameters
In addition to the above parameters, gcc
also has some other parameters, which are also more important, and they are briefly introduced here.
A. Static link parameters
When talking about library files, we mentioned Static link library files (libxxx.a
) and Dynamic link library files ( libxxx.so
). We did not mention the difference between the two. In fact, we simply understand it in the following way. gcc
links the static library file, will copy the part used in the static library file to the generated binary program, resulting in a relatively large generated file; while linking dynamically Library files will not be copied, so the generated binary program will be relatively small. The disadvantage of linking dynamic library files is that when the program is run on other machines, the corresponding dynamic library files are required to be installed correctly on it. Correspondingly, there is no such requirement for programs generated by linking static library files.
When linking with gcc
, default priority is to use the dynamic link library file. Static link library files are used only if no dynamic link library files exist. If you need to use the static link method, you need to add the -static
parameter when compiling to force the use of static link library files. For example, under the /usr/local/opt/llvm/lib
directory, there are both library files libunwind.so
and libunwind.a
. In order to make gcc
use the static link library file libunwind.a
when linking, we can add the -static
parameter and use the following compilation command gcc hello.o –static –L/usr/local/opt/llvm/lib –lunwind
.
B. Optimization parameters
Compilation optimization is also an important function of the compiler. Proper compilation optimization can greatly accelerate the execution efficiency of the program. gcc
provides 4 levels of optimization parameters, namely -O0
, -O1
, -O2
, -O3
. In general, the higher the number, the more compilation optimization strategies are included. In addition, gcc
also provides a special -Os
parameter.
- The
-O0
parameter means that no optimization strategy is used, which is the default optimization parameter ofgcc
. Because no optimization strategy is used, the compiled machine code is highly corresponding to the program source code, and a one-to-one relationship can basically be established between the two. Therefore,-O0
optimization is very suitable for program debugging, and usually cooperates with the parameter-g
(ggenerate debug information) to generate debug information use. The-g
parameter will add some information for code debugging to the generated binary file during compilation, such as symbol table and program source code. -O1
will try to adopt some optimization strategies that do not affect the compilation speed, reduce the size of the generated binary file, and improve the speed of program execution.-O2
uses all optimizations in-O1
, plus some optimizations that reduce compilation speed, to improve< /strong>The execution speed of the program.-O3
uses more optimization strategies on the basis of-O2
. These additional optimizations further reduce compilation speed and increase the size of the resulting binary, but further increase program execution speed >.-Os
optimizes in the opposite direction to-O3
. On the basis of-O2
, it adopts additional optimization strategies to reduce the size of the generated binary files as much as possible.
If you are interested in the optimization strategy enabled under each optimization parameter, or want to know other optimization parameters, you can refer to [1].
C. Macro related parameters
Sometimes, in order to ensure the cross-platform performance of C/C ++ projects, or to be able to flexibly choose among multiple similar libraries when compiling, it is necessary to use conditional compilation in the source code. Conditional compilation means using #ifdef M
, #else
, #endif
(or #ifndef M
, #else
, #endif
, and #if
, #elif
, #else
, #endif
) and other instructions, control the code that needs to be compiled through macro definition.
In C/C ++ language, you can use the #define M
statement to define the macro M
in the source code. But conditional compilation generally needs to pass in a macro definition from the outside world, such as a compiler. Therefore, gcc
provides the macro definition parameter -D
and the cancel macro definition parameter -U
. When compiling with gcc
, you can perform corresponding macro operations in the following ways:
-Dmacro
defines the macromacro
, which is defined as1
by default, which is equivalent to using#define macro
statement.-Dmacro=def
defines the macromacro
asdef
, which is equivalent to using#define macro=def
in the program source code statement.-Umacro
cancels the definition of macromacro
, which is equivalent to using the#undef macro
statement in the program source code.-undef
undefines all non-standard macros.
D. Other
In addition, there are some other parameters that are also important, such as:
- The
-std
parameter can specify the C/C++ standard used for compilation. For example,-std=c++11
means to use the C++11 standard, and-std=c99
means to use the C99 standard. In particular,-ansi
means to use the ANSI C standard, which is generally equivalent to-std=c90
. - The
-Werror
parameter requiresgcc
to display the generated Warning (Warning) as an Error (Error). -Wall
asksgcc
to display as many warning messages as possible.-w
asksgcc
not to display warning messages.- The
-Wl
argument tellsgcc
to pass the following arguments to the linkerld
. - The
-v
parameter can display some additional output information duringgcc
compilation.
If you want to know other parameters of gcc
, you can view them through gcc --help
or man gcc
, or you can directly refer to the GCC manual [1].
Compile parameters are automatically generated (pkg-config)
Generally speaking, it is more troublesome to manually edit the compilation link parameters of the third-party library. We need to find the header file of the third-party library, the installation path of the library file, know which other libraries the third-party library needs to link, know which compilation parameters the third-party library needs, and so on. These are not conducive to the rapid integration of third-party libraries. At present, many modern third-party libraries provide their corresponding compilation parameter automatic generation tools, generally named xxx-config
. For example, llvm
provides the llvm-config
tool. After using the system package manager, or compiling and installing llvm
by yourself, you can directly call the llvm-config
command. Let’s take llvm 10.0
as an example for illustration.
- Execute
llvm-config --cxxflags
, you can get-I/usr/local/Cellar/llvm/11.0.0/include -std=c + + 14 -stdlib=libc + + - D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS
. This is the compilation parameter required to compile the library provided byllvm 10.0
. It shows that the header file directory ofllvm 10.0
is/usr/local/Cellar/llvm/11.0.0/include
, and requires the use of C++14 standard, using C++ + The standard library also defines some macros needed at compile time. - Execute
llvm-config --ldflags
, you can get-L/usr/local/Cellar/llvm/11.0.0/lib -Wl,-search_paths_first -Wl,-headerpad_max_install_names
. This is the link parameter required to link third-party libraries provided byllvm 10.0
. It tells the compiler that the location of the third-party library is in/usr/local/Cellar/llvm/11.0.0/lib
, and will pass some other parameters to the linkerld
. - Executing
llvm-config --libs
will get-lLLVMXRay -lLLVMWindowsManifest ... -lLLVMDemangle
. This is all libraries thatllvm 10.0
can link against. Generally we don’t choose to link all libraries. Instead, use the following commandllvm-config --libs core
to get-lLLVMCore -lLLVMRemarks -lLLVMBitstreamReader -lLLVMBinaryFormat -lLLVMSupport -lLLVMDemangle
. This is the library that needs to be linked to use thecore
module. - Executing
llvm-config --system-libs
will get-lm -lz -lcurses -lxml2
. This is the system library required byllvm 10.0
.
Generally speaking, we will use the parameters of the above commands in combination, such as calling llvm-config --cxxflags --ldflags --system-libs --libs core
to get what we need All compile parameters.
In addition to the xxx-config
that comes with the third-party library, many modern third-party libraries can use the tool pkg-config
to generate compilation parameters. We can use the pkg-config --list-all
command to view all the third-party libraries it supports. The general usage of pkg-config
is to call a command of the form pkg-config pkg-name --libs --cflags
. For example, if we want to use the gmp library, we can execute pkg-config gmp --libs --cflags
and get the following output -I/usr/local/Cellar/gmp/6.2.1/ include -L/usr/local/Cellar/gmp/6.2.1/lib -lgmp
.
We can directly copy these outputs and paste them into the
gcc
command, or we can use a command like “gcc a.c `pkg-config gmp –libs –cflags`” to pass the embedded shell command The way to pass the compilation parameters of the third-party library togcc
References
[1] Using the GNU Compiler Collection (GCC), 3.11 Options That Control Optimization, https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
[2] GCC 10.1 Manuals, https://gcc.gnu.org/onlinedocs/10.1.0/
The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge