GCC of GCC/Make/CMake

This series of tutorials will introduce the common toolchains for building and compiling modern C/C++ projects, GCC, Make, and CMake. Among them, GCC is a compilation tool of C/C++ language, Make is an incremental (compilation) batch processing tool, and CMake is a Make script generation tool. In the build of a modern C/C++ project, their relationship is as follows.

 cmake make gcc
CMakelist.txt -----> Makefile ----> Cmds ---> Binary

Developers need to write a CMakelist.txt file to configure project-related CMake parameters. By running the cmake command, the Make tool of the corresponding platform is automatically generated to automatically build the script Makefile file. Of course, CMake also supports generating configuration files for other build tools, such as Xcode’s xxxx.xcodeproj, Visual Studio’s xxxx.sln, Ninja xxxx.ninja and so on. At present, most open source C/C ++ projects support using CMake to generate Makefile files, and then call the make command to use the Make tool for automatic construction. The Makefile file can be regarded as a series of shell commands that depend on the file. It implements incremental processing based on time stamps of file modifications. The specific rules are roughly as follows. If the timestamp of the generated target file is earlier than the timestamp of the dependent file, execute the corresponding command and regenerate > object file. This actually implies that the Make tool is not only used for compilation, but also for other incremental file generation tasks. When using the Make tool to compile a C/C ++ project, the shell command is generally used to call gcc, automatic and incremental implementation A series of work such as compiling and linking of C/C ++ source code.

Introduction to GCC

In the early days, GCC was short for GNU C Compiler, which is the C language compiler in the GNU project. But after years of expansion and iteration, GCC gradually supports compilation of more and more languages such as C, C++, Objective-C, Fortran, Java, Ada, and Go. Therefore, its GCC was redefined as GNU Compiler Collection, the GNU Compiler Suite. In this article, we only introduce the use of GCC to compile C/C++ projects.

It is worth noting that Apple has been using GCC as the official compiler. However, because the GCC development community has always given low priority to Apple’s needs, many of Apple’s important needs are basically not considered. As a result, the rich Apple decided to abandon GCC in a rage, and redeveloped the compilation tool Clang based on LLVM to support C, C++, Objective-C and other languages. Therefore, the current default gcc command on macOS actually calls clang. If you want to use GCC on macOS, you need to install it yourself, such as using Homebrew, a commonly used package management tool on macOS (brew install gcc). Fortunately, clang is basically a replica of gcc in terms of usage (calling methods, parameters, etc.). Therefore, in this article, although the author talks about GCC, the actual examples given are all using Clang. As far as the content introduced in this article is concerned, the difference between the two is not big, and basically it will not cause too many problems.

Compilation process

When using gcc to compile a C/C ++ program, the main compilation process is as follows, including preprocessing, compilation, assembly strong>, link and other four steps. Taking the input of the C language program source code file b.c as an example, directly calling the command gcc b.c will completely execute the following process and generate the corresponding executable binary file a.out. Note that the default output of gcc here is the fixed a.out. In the GCC tool chain, the assembly is done by the tool as, and the link is done by the tool ld.

 -E -S -c
b.c ------> b.i ------> b.s ------> b.o ------> a.out
      gcc gcc as ld

Using the following commands for gcc will make its compilation process stop at the corresponding position:

-E, (prEprocessing), execute after the preprocessing step, that is, process # in the C/C++ source code , including macro expansion and #includeheader file import and so on. This command does not output files by default, you can use the -o command to output files with the suffix *.i.
-S, (aSsembly), after the compile step is executed, an assembly file is generated, but binary machine code is not generated. The default output file suffix of this command is *.s.
-c, (compilation), after executing the assembly step, call the tool as to generate from the assembly code Binary machine code, but not linked. The default output file suffix of this command is *.o (object).
Calling gcc without the above parameters will completely execute the above process, that is, after the linking (linking) step. The linking step actually calls the linking tool ld to execute, which will combine the binary files generated by the source code, library files, and the startup part of the program to form a complete binary executable file.

In particular, using the command -o, (ooutput), you can specify the name of the output file. For example, gcc b.c -o b.bin will generate the executable file b.bin instead of the default a.out.

The above instructions can be called on the basis of any link in the compilation process, for example:

> gcc -E b.c -o b.i
> ls
b.c b.i
> gcc -S b.i
b.c b.i b.s
> gcc -c b.s
b.c b.i b.o b.s
> gcc b.o
b.c b.i b.o a.out b.s

Package Management

We will introduce the main compilation parameters of GCC later, but before that, the core content I hope to introduce is “how to use GCC compilation parameters from the perspective of package management”. This is also what the author most hopes to introduce to readers. Therefore, this section will first introduce the package management methods of C/C++ projects.

For an actual C/C ++ project, generally there is not only one source file, and third-party library (Third-party Library) will be used in most cases. Since C/C++ does not have an official package management tool (Package Manager), such as Python’s pip, Java’s maven, Nodejs’ npm and so on, so when using a third-party library in a C/C ++ project, generally use the package manager that comes with the system to install the third-party library, such as under Ubuntu apt-get, brew (Homebrew) for macOS, etc. For third-party libraries not included in the system package manager, we generally choose to compile and install them ourselves, or compile them together as sub-projects.

The third-party library is mainly composed of two parts, namely a) header file, b) library file. Header files are generally a series of files named xxx.h (head), which is equivalent to exposing the API interface (function signature) provided by the third-party library. Library files generally include Static library files and Dynamic library files, which are equivalent to the binary implementation of functions of third-party libraries. Among them, the static library file is a series of files named libxxx.a (archive) (libxxx.lib under Windows, library). The dynamic library file is a series of files named libxxx.so (shared object) (libxxx. dll, dynamic link library, under macOS is libxxx.dylib, dynamic library). The header files of the system’s own and third-party libraries installed by the system package manager are generally in the /usr/include or /usr/local/include path. The library files are generally in the /lib, /usr/lib and /usr/local/lib directories.

Due to the influence of the above factors, the GCC tool chain is not responsible for managing third-party libraries, so it is impossible to determine which libraries need to be used in a C/C++ project, and the accurate information of these libraries, such as location, version, etc. Therefore, only using GCC cannot completely and automatically solve the dependency problem of the third-party library of the C/C ++ project. That is to say, languages such as Python and Java cannot, only need to use the import xxx statement to import the corresponding package, and the package manager of the language can automatically resolve the dependencies of the third-party library. After the C/C++ language uses the #include "xxx" statement, we need to manually add various compilation parameters, such as -I, -l and -L, pass the relevant information of the third-party library it depends on to the gcc compiler. Among them, -I passes the “directory where the header file is located”, -l passes the “name of the library” that needs to be linked, -L code> passes the “directory where the library file is located”. These three parameters are particularly important, and I hope readers will keep them in mind.

Compile parameters

`-I`parameter

Looking back at the GCC compilation process introduced earlier, the #include directive needs to be processed in the preprocessing stage to replace the included header files into the source code. Generally speaking, during preprocessing, gcc will automatically search for corresponding header files in the current project directory and /usr/include directory.

However, for the header files of third-party libraries located in other directories, gcc cannot automatically find the location of the required header files, and will report something like xxx.h: file not found code> error. We need to use the -I parameter to specify the location of the third-party library header file. For example, under macOS, using the Homebrew package manager to install llvm will install the third-party library contained in the LLVM project accordingly, and its corresponding header files are located in /usr/local/opt/ llvm/include directory.

When we use the library provided by LLVM, we can use -I/usr/local/opt/llvm/include (or -I /usr/local/opt/llvm/include code>, plus a space) to specify the location of the header file. Thus, gcc will additionally search for the corresponding header files in the directory specified by the -I parameter. The -I parameter can be used repeatedly to specify multiple additional header file directories. The -I parameter generally specifies an absolute path, but you can also use a relative path. For example, if the header file is in the current directory, you can use -I . to specify.

It should be noted that in the C/C ++ source code, when the #include "xxxx.h" statement is used, the xxxx.h can carry the path. We can even refer to header files using absolute paths. For example, there is a header file /usr/local/opt/llvm/include/llvm/Pass.h, when we use it, we can directly refer to it in this way #include"/ usr/local/opt/llvm/include/llvm/Pass.h".

However, in C/C++ projects, this practice is not recommended. The recommended method is to use the method of relative path plus parameter -I include_dir to refer to the header file. For example, in the above example, we will directly use #include "llvm/Pass.h" in the source code, and pass the parameter -I /usr/ to the directory where the header file of the llvm library is located. local/opt/llvm/include is passed to gcc. This can flexibly manage third-party library versions, and it is also convenient for multi-person collaborative development under different machines, which is much better than directly including absolute path header files.

All in all, when gcc performs preprocessing, the library file directory (such as the directory passed in by the -I parameter, and the default /usr/include code>, /usr/local/include and other directories), and the xxxx.h of the #include "xxxx.h" statement in the program source code Do combined splicing. If there is an actual header file in the resulting path for a certain combination, the header file will be included.

`-l`parameter

During the link phase of the GCC compilation process, the standard library, such as libc.a, will be linked by default, but for third-party libraries, it needs to be added manually. If the following error is reported during compilation: Undefined symbols for architecture x86_64: xxx...xxx ld: symbol(s) not found for architecture x86_64This is usually caused by not correctly specified< /strong>It is caused by the third-party library that needs to be linked.

When using gcc, you generally choose to use the -l parameter to specify the library that needs to be linked. For example, suppose we use the math library (namely #include), when compiling, it will report the above Undefined code> error. At this time, we can use the -lm (or -l m) parameter to specify that the math library needs to be linked.

Note that some gcc compilers will automatically link the math library as a standard library. At this time, we need to add the -nostdlib parameter to make it not automatically link the standard library, so that the above Undefined error will be reported.

At first glance at the -lm parameter, it may feel a little weird. So, how is the -l parameter used? The -l parameter needs to be followed by the library name (such as m), not the library file name (such as libm.so). But there is a very intuitive connection between the library name and the library file name. Take the math library as an example, the library file name is libm.so, and the library name is m. It is easy to see that the library name is obtained by removing the prefix lib and the suffix .so of the library file name. For another example, the library file libLLVMCore.a included in LLVM, the corresponding library name is LLVMCore, and the parameter to link it is -lLLVMCore .

-Lparameter

Library files located in /lib, /usr/lib, /usr/local/lib and other directories, such as libm.so , after using the -l parameter, it can be linked directly. But if the library files are not in these directories, only use the -l parameter, and an error will still be reported when linking, ld: library not found for -lxxx. This means that the linker ld cannot find libxxx.so or libxxx.a in the current library path.

At this time, we need to use the -L parameter to tell gcc the path of the library file to be linked. The -L parameter needs to be followed by the path where the library file is located. For example, under macOS, use the Homebrew package manager to install llvm, and its corresponding library files are located in the /usr/local/opt/llvm/lib directory. If we need to use the library LLVMCore, that is, the link library file libLLVMCore.a, in addition to adding the -lLLVMCore parameter, we also need to use the parameter -L/usr/local/opt/llvm/lib, tell gcc the directory where the library file is located.

Other compilation parameters

In addition to the above parameters, gcc also has some other parameters, which are also more important, and they are briefly introduced here.

A. Static link parameters

When talking about library files, we mentioned Static link library files (libxxx.a) and Dynamic link library files ( libxxx.so). We did not mention the difference between the two. In fact, we simply understand it in the following way. gcc links the static library file, will copy the part used in the static library file to the generated binary program, resulting in a relatively large generated file; while linking dynamically Library files will not be copied, so the generated binary program will be relatively small. The disadvantage of linking dynamic library files is that when the program is run on other machines, the corresponding dynamic library files are required to be installed correctly on it. Correspondingly, there is no such requirement for programs generated by linking static library files.

When linking with gcc, default priority is to use the dynamic link library file. Static link library files are used only if no dynamic link library files exist. If you need to use the static link method, you need to add the -static parameter when compiling to force the use of static link library files. For example, under the /usr/local/opt/llvm/lib directory, there are both library files libunwind.so and libunwind.a. In order to make gcc use the static link library file libunwind.a when linking, we can add the -static parameter and use the following compilation command gcc hello.o –static –L/usr/local/opt/llvm/lib –lunwind.

B. Optimization parameters

Compilation optimization is also an important function of the compiler. Proper compilation optimization can greatly accelerate the execution efficiency of the program. gcc provides 4 levels of optimization parameters, namely -O0, -O1, -O2, -O3. In general, the higher the number, the more compilation optimization strategies are included. In addition, gcc also provides a special -Os parameter.

The -O0 parameter means that no optimization strategy is used, which is the default optimization parameter of gcc. Because no optimization strategy is used, the compiled machine code is highly corresponding to the program source code, and a one-to-one relationship can basically be established between the two. Therefore, -O0 optimization is very suitable for program debugging, and usually cooperates with the parameter -g (ggenerate debug information) to generate debug information use. The -g parameter will add some information for code debugging to the generated binary file during compilation, such as symbol table and program source code.

-O1 will try to adopt some optimization strategies that do not affect the compilation speed, reduce the size of the generated binary file, and improve the speed of program execution.

-O2 uses all optimizations in -O1, plus some optimizations that reduce compilation speed, to improve< /strong>The execution speed of the program.

-O3 uses more optimization strategies on the basis of -O2. These additional optimizations further reduce compilation speed and increase the size of the resulting binary, but further increase program execution speed >.

-Os optimizes in the opposite direction to -O3. On the basis of -O2, it adopts additional optimization strategies to reduce the size of the generated binary files as much as possible.

If you are interested in the optimization strategy enabled under each optimization parameter, or want to know other optimization parameters, you can refer to [1].

C. Macro related parameters

Sometimes, in order to ensure the cross-platform performance of C/C ++ projects, or to be able to flexibly choose among multiple similar libraries when compiling, it is necessary to use conditional compilation in the source code. Conditional compilation means using #ifdef M, #else, #endif (or #ifndef M, #else, #endif, and #if, #elif, #else, #endif) and other instructions, control the code that needs to be compiled through macro definition.

In C/C ++ language, you can use the #define M statement to define the macro M in the source code. But conditional compilation generally needs to pass in a macro definition from the outside world, such as a compiler. Therefore, gcc provides the macro definition parameter -D and the cancel macro definition parameter -U. When compiling with gcc, you can perform corresponding macro operations in the following ways:

-Dmacro defines the macro macro, which is defined as 1 by default, which is equivalent to using #define macro statement.

-Dmacro=def defines the macro macro as def, which is equivalent to using #define macro=def in the program source code statement.

-Umacro cancels the definition of macro macro, which is equivalent to using the #undef macro statement in the program source code.

-undef undefines all non-standard macros.

D. Other

In addition, there are some other parameters that are also important, such as:

The -std parameter can specify the C/C++ standard used for compilation. For example, -std=c++11 means to use the C++11 standard, and -std=c99 means to use the C99 standard. In particular, -ansi means to use the ANSI C standard, which is generally equivalent to -std=c90.

The -Werror parameter requires gcc to display the generated Warning (Warning) as an Error (Error).

-Wall asks gcc to display as many warning messages as possible.

-w asks gcc not to display warning messages.

The -Wl argument tells gcc to pass the following arguments to the linker ld.

The -v parameter can display some additional output information during gcc compilation.

If you want to know other parameters of gcc, you can view them through gcc --help or man gcc, or you can directly refer to the GCC manual [1].

Compile parameters are automatically generated (pkg-config)

Generally speaking, it is more troublesome to manually edit the compilation link parameters of the third-party library. We need to find the header file of the third-party library, the installation path of the library file, know which other libraries the third-party library needs to link, know which compilation parameters the third-party library needs, and so on. These are not conducive to the rapid integration of third-party libraries. At present, many modern third-party libraries provide their corresponding compilation parameter automatic generation tools, generally named xxx-config. For example, llvm provides the llvm-config tool. After using the system package manager, or compiling and installing llvm by yourself, you can directly call the llvm-config command. Let’s take llvm 10.0 as an example for illustration.

Execute llvm-config --cxxflags, you can get -I/usr/local/Cellar/llvm/11.0.0/include -std=c + + 14 -stdlib=libc + + - D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS. This is the compilation parameter required to compile the library provided by llvm 10.0. It shows that the header file directory of llvm 10.0 is /usr/local/Cellar/llvm/11.0.0/include, and requires the use of C++14 standard, using C++ + The standard library also defines some macros needed at compile time.

Execute llvm-config --ldflags, you can get -L/usr/local/Cellar/llvm/11.0.0/lib -Wl,-search_paths_first -Wl,-headerpad_max_install_names . This is the link parameter required to link third-party libraries provided by llvm 10.0. It tells the compiler that the location of the third-party library is in /usr/local/Cellar/llvm/11.0.0/lib, and will pass some other parameters to the linker ld.

Executing llvm-config --libs will get -lLLVMXRay -lLLVMWindowsManifest ... -lLLVMDemangle. This is all libraries that llvm 10.0 can link against. Generally we don’t choose to link all libraries. Instead, use the following command llvm-config --libs core to get -lLLVMCore -lLLVMRemarks -lLLVMBitstreamReader -lLLVMBinaryFormat -lLLVMSupport -lLLVMDemangle. This is the library that needs to be linked to use the core module.

Executing llvm-config --system-libs will get -lm -lz -lcurses -lxml2. This is the system library required by llvm 10.0.

Generally speaking, we will use the parameters of the above commands in combination, such as calling llvm-config --cxxflags --ldflags --system-libs --libs core to get what we need All compile parameters.

In addition to the xxx-config that comes with the third-party library, many modern third-party libraries can use the tool pkg-config to generate compilation parameters. We can use the pkg-config --list-all command to view all the third-party libraries it supports. The general usage of pkg-config is to call a command of the form pkg-config pkg-name --libs --cflags. For example, if we want to use the gmp library, we can execute pkg-config gmp --libs --cflags and get the following output -I/usr/local/Cellar/gmp/6.2.1/ include -L/usr/local/Cellar/gmp/6.2.1/lib -lgmp.

We can directly copy these outputs and paste them into the gcc command, or we can use a command like “gcc a.c `pkg-config gmp –libs –cflags`” to pass the embedded shell command The way to pass the compilation parameters of the third-party library to gcc

References

[1] Using the GNU Compiler Collection (GCC), 3.11 Options That Control Optimization, https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

[2] GCC 10.1 Manuals, https://gcc.gnu.org/onlinedocs/10.1.0/

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge

Introduction to GCC

Compilation process

Package Management

Compile parameters

-Iparameter

-lparameter

-Lparameter

Other compilation parameters

A. Static link parameters

B. Optimization parameters

C. Macro related parameters

D. Other

Compile parameters are automatically generated (pkg-config)

References

`-I`parameter

`-l`parameter

`-L`parameter