LLVM- cc1 & llc & lld

cc1

cc1 is a core part of the LLVM toolchain, specifically within the Clang compiler framework. Although developers may not interact directly with it in their daily programming work, understanding how it works is helpful for a deep understanding of the entire compilation process.

Definition: cc1 is the actual compiler frontend for Clang. When we use the clang command line tool to compile C/C++ code, it actually starts cc1 internally to do most of the work.
Main Responsibilities:
- Lexical analysis: Breaking the source code into tokens.
- Syntax analysis: organize these tokens into a syntax tree.
- Semantic analysis: Determine the meaning of the syntax tree and perform preliminary error checking.
- Generate intermediate representation: Generate LLVM IR from syntax tree.
Why it exists:
- The main reason for separating cc1 and clang is for modularity. This allows Clang to easily call other frontends when needed, such as cc1plus for C++ code.
- It also provides other tools, such as compiler plugins or advanced tools, with direct access to various stages of the compilation process.
How to use: Typically, developers do not need to call cc1 directly because the clang tool will do this for us automatically. However, understanding its existence and knowing how to use it can help us better debug and understand the compilation process. For example, use the clang -### command to show how clang calls cc1.
Example:
If you run the following command:
```
clang -### example.c
```
We’ll see how clang calls cc1 and the arguments passed to it.
Note:
- Although cc1 is a central part of the compilation process, it is not responsible for linking or assembly. This work is done by other tools, such as lld or the system’s default linker.
- cc1plus is a similar tool designed specifically for working with C++ code.

Overall, cc1 is the core frontend in the Clang compiler that processes source code and generates LLVM IR. Understanding it and the entire compilation process is an in-depth study of compilation principles and the LLVM framework.

llc

llc is a component in the LLVM tool chain. Its main function is to convert LLVM intermediate representation (LLVM IR) into target machine code. LLVM IR is a low-level, architecture-independent intermediate representation, usually suffixed with a .ll or .bc file. llc allows you to convert this intermediate representation into assembly or machine code for many supported target platforms.

Main functions: Convert LLVM’s intermediate representation (IR) into assembly code or machine code for a specific target platform.
Usage scenarios:
- When we have an LLVM IR file and want to generate assembly or machine code for a specific architecture.
- In the complete compilation process, llc is usually executed immediately after clang (converting C/C++ source code to LLVM IR).

Basic usage:

llc input.ll # will generate a platform-dependent assembly file, such as input.s

Common options:
- -march=: Specify the target architecture, such as x86, arm, aarch64, etc.
- -filetype=: Specify the output file type. For example, asm represents the assembly file (default), and obj represents the object file.
- -o : Specify the name of the output file.
Example:
If we have an LLVM IR file named input.ll and want to convert it to assembly code for the ARM architecture:
```
llc -march=arm input.ll -o output-arm.s
```
Note:
- Before using llc, you may want to ensure that the target architecture’s backend is available at the time of the LLVM installation. Otherwise, you risk receiving an error stating that the architecture is not supported.
- Usually, for developers, llc is just one step in the complete compilation and linking process. In order to generate executable binaries, additional tools and steps are required, such as assemblers and linkers.

In general, llc is one of the core components of LLVM, allowing developers to generate target machine code from LLVM IR. Through it, LLVM provides a platform-independent compilation strategy, making optimization and code generation possible for multiple target platforms.

“Assembler” in LLVM usually refers to the LLVM IR’s assembler and disassembler. This is slightly different from a machine code assembler in the usual sense. The LLVM IR’s assembler and disassembler are tasked with converting between LLVM’s intermediate representation (IR) and its textual form.

LLVM IR: LLVM IR is an intermediate representation of LLVM and can be considered a low-level but still readable programming language. It comes in two forms:
- Text form (also known as LLVM assembly language). Usually saved as a .ll file.
- Binary form (also known as LLVM bitcode). Usually saved as a .bc file.
Task:
- Assembler: Converts textual LLVM IR to binary LLVM bitcode.
- Disassembler: Converts binary LLVM bitcode to textual LLVM IR.
Use:
This conversion can be performed using the llvm-as and llvm-dis tools.
- Use llvm-as to assemble:
```
llvm-as input.ll -o output.bc
```
- Use llvm-dis to disassemble:
```
llvm-dis input.bc -o output.ll
```
Note:
- “Assembly” and “disassembly” here are relative to LLVM IR, not to machine code assembly.
- The LLVM project does not provide a machine code assembler in the traditional sense. When LLVM needs to generate machine code, it usually generates machine code bitcode directly rather than generating assembly code. Nonetheless, it is possible to generate target machine assembly code from an LLVM IR using the llc tool.
- The textual representation of LLVM IR is often used for debugging, analysis, or teaching purposes because it provides a human-readable intermediate compiled representation.

Overall, LLVM provides tools to assemble and disassemble its intermediate representation, the LLVM IR. This allows developers to convert between text and binary formats, making it easier to understand, debug, and optimize their code.

lld

lld is a linker in the LLVM project. Linking is the final stage of the compilation process and involves combining the various object files produced by the compiler (usually .o or .obj files) into a single executable File, shared library or static library. And lld is designed to perform this task.

Definition: lld is the official linker of the LLVM project, designed to provide high-performance and modular linking for a variety of platforms. It aims to provide speeds that are comparable to or better than other system linkers and to work seamlessly with other LLVM tools.
Features:
- Cross-platform: lld supports multiple target platforms, including ELF (Linux), Mach-O (macOS), and COFF (Windows).
- Performance: lld is designed to be fast. It is often compared for performance with other linkers such as GNU ld or gold and provides good performance in most cases.
- Simplicity: Compared to other linkers, lld‘s code base is relatively small and modular, making it easy to maintain and extend.
How to use:
If you have the full LLVM suite installed, you can usually use lld in the following ways:
```
clang -fuse-ld=lld your_source_file.c
```
Or use directly:
```
lld [options] input_files
```
Subproject:
lld itself is modular and contains several subprojects, each dedicated to providing support for a specific target platform:
- ELF: Provides support for ELF-based systems such as Linux.
- COFF: Provides support for Windows platforms.
- Mach-O: Provides support for Apple platforms such as macOS and iOS.
- Wasm: Provides support for WebAssembly.
- MinGW: Provides support for MinGW.
Why choose lld:
- Seamless integration with LLVM: If you are already using the LLVM tool chain (such as clang), lld provides the ability to seamlessly integrate with these tools.
- Open source and active development: As part of the LLVM project, lld is actively developed and receives support from the community.
- Performance: For projects that need to be built quickly, lld may provide faster link times than other linkers.
Note:
- Although lld has proven reliable in many scenarios, there may be some compatibility or feature differences with other linkers depending on the specific use case and platform.

In summary, lld is the linker component of the LLVM project, providing a high-performance, modular solution for linking object files into executables or libraries.