Understand Clang Module and Module Map syntax

Clang Module appeared around 2013, and it appeared to solve the disadvantages of header file inclusion in traditional C-based programming languages. It is also a technology that must be used in modern Apple platform software development. Understanding Clang Module will help us organize code structure, understand Xcode compilation process, optimize compilation speed, and locate compilation errors.

Disadvantages of traditional header file inclusion

Traditional header file inclusion has the following main problems:

Compilation performance issues

For traditional header inclusion, the preprocessor will copy-paste the contents of the header file in place of the #include preprocessing directive. And many header files will contain the same other header files, such as underlying dependencies, system libraries, etc., which will cause duplicate content in different source files. That is to say, for M source files, if there are N header files, the complexity metric is M x N, and the compiler will perform a text analysis for each repeated content, doing a lot of repetitive work and slowing down compile time.

For example, the Foundation framework of the system contains more than 800 other header files embedded in the framework, and the size of the whole framework is more than 9MB. As the most basic framework, almost every source file will contain Foundation.h. With traditional header inclusion, the contents of Foundation.h and other included headers are repeatedly lexed and semantically analyzed, slowing down compilation.

Vulnerability

The brittleness is because the content replaced by #include can be affected by other preprocessing directives. For example, if there is a symbol XXX in the header file, if there is a macro definition like #define XXX "other text" before including this header file, all XXX in the header file will be included Replaced with “other text”, resulting in a compilation error.

#define XXX "other text"
#include "XXX.h"

Some problems must be solved using conventional solutions

Traditional header file inclusion cannot solve the problem of repeated inclusion of header files, so everyone uses a convention to avoid repeated inclusion.

#ifndef __XXXX_H__
#define __XXXX_H__
// header file content
#endif

Although modern development tools can automatically generate these, there are still some inconveniences.

In addition, in order to solve the problem of duplicate names of macros among multiple libraries, everyone will make the names of macros very long and add prefixes and suffixes.

Confused about tools

In C-based languages, the boundaries of software libraries are not very clear. For example, it is difficult to identify which language a header file belongs to, because the header files of C, C++, Objective-C and other languages are all .h. It’s also hard to figure out which library a header file belongs to. This brings certain difficulties to the development of tools based on these software libraries.

What problems can the Clang Module solve?

Semantic import

Clang Module has been improved from textual imports of traditional header inclusions to more robust and efficient semantic imports. When the compiler sees a Module import instruction, the compiler will load a binary file that provides information about all APIs of the module, and these APIs can be used directly by other code.

Compile performance improvement

Clang Module improves compilation performance. Each module only needs to be compiled once, and then a binary representation of the module (.pcm, precompiled module, explained below) is generated and cached on disk. The next time the module is imported, the compiler does not need to compile the Module again, but directly reads the cached binary representation.

Context free

Clang Module solves the problem of fragility. Each Module is an independent entity, which will be compiled in isolation and independently, and is context-independent. When importing a module, other preprocessing directives of the import context are ignored, so that the preprocessing directives before the import will not have any impact on the module import.

Each module is a self-contained entity, they are context-independent and isolated from each other, so there is no need to use some conventional methods to avoid some problems, because these problems will no longer appear.

Make a module yourself

In order to have an intuitive understanding, we can make a module by ourselves. Use Xcode to create a new iOS app project for testing. Then create a new group named Frameworks under the project root directory.

Enter the Frameworks folder in the command line, create a new Dog.framework folder, the name can be arbitrary, here is a random name.

mkdir Dog.framework

Then go back to Xcode, right-click on the Frameworks directory, and select Adds files to… to add Dog.framework to the Frameworks directory. At this time, the compilation will report an error Framework not found Dog. Next, let’s see how to make a module that Xcode can correctly identify and compile.

Create a new Dog.swift file inside Dog.framework and add the following:

// Dog. swift
import Foundation

public class Dog: NSObject {<!-- -->
    public func bark() {<!-- -->
        print("bark")
    }

    @objc func objcBark() {<!-- -->
        print("objc bark")
    }
}

Next, let’s generate an interface file for this framework. Execute the following commands on the command line:

swiftc -module-name Dog -c Dog.swift -target arm64-apple-ios16.2-simulator -sdk /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneSimulator.platform/Developer/SDKs/iPhoneSimulator16. 2.sdk -emit-module -emit-objc-header -emit-objc-header-path Dog-Swift.h

swiftc is a compiler for the Swift language, and it also calls clang at the bottom. The parameters are explained one by one as follows:

  • -module-name Dog The name of the module, users can import the module by import + this name.
  • -c Dog.swift Specifies the source file to compile.
  • -target arm64-apple-ios16.2-simulator Specifies the architecture of the build target.
  • -sdk /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneSimulator.platform/Developer/SDKs/iPhoneSimulator16.2.sdk Specifies the SDK to be linked in, here is the simulation of iOS 16.2 version.
  • -emit-module generates a .swiftdoc file and a .swiftmodule file.
  • -emit-objc-header Generates an Objective-C header containing only symbols marked @objc.
  • -emit-objc-header-path Specifies the path of the Objective-C header file. Here we follow the Xcode convention and use “module name + Swift.h” to name it.

Although the required files have been generated, they are not the module directory structure supported by Xcode and cannot be read by Xcode. We can understand this structure by observing the Framework created by Xcode to create the correct structure.

Create a Headers folder inside the Dog.framework folder, then move Dog-Swift.h into the Headers folder. Then create another Modules folder in the Dog.framework folder, then create a Dog.swiftmodule folder in the Modules folder, and move Dog.swiftdoc and Dog.swiftmodule into the Dog.swiftmodule folder. Finally rename these two files to arm64.swiftdoc and arm64.swiftmodule.

The current directory structure of Dog.framework is:

Dog.framework/
|---- Dog
|---- Headers
| |---- Dog-Swift.h
|----Modules
     |---- Dog. swiftmodule
         |----arm64.swiftdoc
         |----arm64.swiftmodule

Now that the interface exists, but there is no binary library file, it still cannot be compiled. Next, let’s generate a binary library file.

Execute the following command:

swiftc -module-name Dog -parse-as-library -c Dog.swift -target arm64-apple-ios16.2-simulator -sdk /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneSimulator.platform/ Developer/SDKs/iPhoneSimulator16.2.sdk-emit-object

Many parameters in this command are the same as those in the previous command, so we will not repeat them here, but only explain the parameters that are not in the previous command.

  • -parse-as-library Tells the compiler to interpret the file as a library rather than as an executable.
  • -emit-object Output object file.

After this command is executed, the object file Dog.o will be generated, and then we need to archive the object file as a library.

libtool -static Dog.o -arch_only arm64 -o Dog

Here, in order to simplify the process, we choose to create a static library instead of a dynamic link library, so use -static. At this time, the binary static library file Dog will appear in Dog.framework. At this point, we import Dog in ViewController and then compile the project, and the compilation will pass. Explain that the current directory structure allows Xcode to correctly find the files required by the module.

Module map

Next, let’s try to use Objective-C to call the Dog module.

Create another Objective-C class in the above project, name it OCObject, let Xcode automatically create the header file bridge file, and add the following code:

// OCObject.h
@interface OCObject : NSObject

- (void)doSomething;

@end
// OCObject.m
#import "OCObject.h"
#import <Dog/Dog-Swift.h>

@implementation OCObject

- (void)doSomething {<!-- -->
    Dog *dog = [[Dog alloc] init];
    [dog objcBark];
}

@end

You will find that objc mark can be printed out at this time. Then replace #import with the standard module import syntax @import Dog;, compile but report an error, prompting “Module ‘Dog’ not found”.

At this time, because an important modulemap file is missing in the framework, Xcode cannot find the module. #import works because it is itself a forward-compatible statement, if the framework supports modules, it will import the module, if the framework does not support modules, it will Find this header file in the search path like #include, and paste the text content here directly.

Module map indicates how the logical structure of header files in the framework should be mapped to modules. Refer to the module map file automatically created when creating a framework with Xcode, you will find a module.modulemap file under the Modules folder, the content of which is as follows:

framework module ObserveModuleStructure {
  umbrella header "ObserveModuleStructure.h"

  export *
  module * { export * }
}

module ObserveModuleStructure. Swift {
  header "ObserveModuleStructure-Swift.h"
  requires objc
}

This syntax is explained one by one by referring to the documentation of clang:

  • framework module XXXX defines a module with framework semantics
  • umbrella header "XXXX.h" indicates that the XXXX.h file is used as the unbrella header of the module. The umbrella header file is equivalent to a collection of all public header files in the module, which is convenient for users to import.
  • export * re-export the symbols in all submodules to the main module
  • module * { export * } defines a submodule, here * means to create a submodule for each header file in the umbrella header.

Write your own module map file according to this syntax, the path is Dog.framework/Modules/module.modulemap:

// Dog.framework/Modules/module.modulemap
framework module Dog {
    umbrella header "Dog.h"
    export *
    module * { export * }
}

module Dog. Swift {
    header "Dog-Swift.h"
    requires objc
}

At this time, the compilation error is still reported, and an unbrella header file is needed. Create a Dog.h file and put it in Dog.framework/Headers/, and the content is empty. Then you can compile and pass, and print out bark objc.

Module Map Language Grammar

Officially, this syntax is called Module Map Language (Module Map Language).

According to Clang’s documentation, the module mapping language may not remain stable between major versions of Clang, so in normal development, just let Xcode generate it automatically.

Module declaration

[framework] module module-id [extern_c] [system] {
    module-member
}

framework

framework means that this module is a Darwin-style framework. Darwin-style frameworks mainly appear in macOS and iOS operating systems. All of its contents are contained in a Name.framework folder. The Name is the name of the framework. The contents of this folder are laid out as follows:

Name.framework/
    Modules/module.modulemap framework's module map
    Headers/ contains the header files in the framework
    PrivateHeaders/ contains private header files in the framework
    Frameworks/ contains other frameworks embedded
    Resources/ contains additional resources
    Name is a symbolic link to the shared library

system

system specifies that this module is a system module. When a system module is recompiled, all header files of the module will be treated as system header files, so some warnings will not appear. This is equivalent to putting #pragma GCC system_header in the header file.

extern_c

extern_c indicates that the C code contained in the module can be used by C++. When this module is compiled for C++ calls, all header files in the module will be included in an implicit extern "C" code block.

Module body

The module body contains common declarations such as headers and requires and submodule declarations, for example:

framework module Dog {
    umbrella header "Dog.h"
    requires objc
    module * { export * }
}

header

header specifies which header files should be mapped as modules. The umbrella header specifies a comprehensive umbrella header file.

requires

The requires statement specifies the conditions that must be met by compilation units that import this module. This condition has language, platform, compilation environment, and target specific functions, etc. For example, requires cplusplus11 indicates that the module needs to be used in an environment that supports C++11, and requires objc indicates that the module needs to be used in an environment that supports Objective-C language.

module

module is used to declare submodules in the module, if it is module *, it means that each header file in the module will be regarded as a submodule.

Submodule declaration

Modules declared nested within the module body of the main module are submodules. For example, to declare a submodule A in the MyLib module, the writing method is as follows:

module MyLib {
    module A {
        header "A.h"
        export *
    }
}

explicit

The explicit modifier is used to modify submodules. If you want to use a submodule modified by explicit, you must specify the name of the submodule when importing, like this import modulename.submodulename, or this submodule has been re-exported by other imported modules.

export

export specifies which module’s API will be re-exported and become the API of the module where the export is located.

export_as

export_as exports the API of the current module through another specified module.

module MyFrameworkCore {<!-- -->
    export_as MyFramework
}

In the above example, the API in MyFrameworkCore will be exported through MyFramework.

The module mapping language also contains many other declaration statements, such as use, config_macrs, link, conflict, etc., because Not many appear in iOS development, so I won’t explain them one by one here. If you are interested, you can check the official documentation of Clang.

Clang Module’s caching mechanism

Clang can compile the module specified in the modulemap into a precompiled module (Precompiled Module) by reading the contents of the modulemap file, and the suffix is .pcm.

clang -cc1 -emit-obj use.c -fmodules -fimplicit-module-maps -fmodules-cache-path=prebuilt -fdisable-module-hash

The above command allows the compiler to find the modulemap file by specifying the parameter implicit-module-maps according to certain rules, and tells the compiler the cache path of the precompiled module by specifying the parameter modules-cache-path. Clang will compile each module according to the information in the modulemap, and put the generated .pcm file in the prebuilt directory.

The .pcm file saves the module information in a format that the compiler can easily read and parse. Afterwards, when the compiler encounters a need to rely on this module when compiling other modules, it can quickly read the module information from the .pcm without recompiling the module.

Using Clang Modules in Xcode

Frameworks or libraries created with Xcode enable Clang Module support by default, that is, in Build Settings, Defines Module is set to YES. If it is a very old library, it may not be enabled, just manually set the Defines Module to YES.

When Defines Modules is YES, Xcode will add module-related parameters such as -fmodules to the clang command when compiling the project, and enable module support.

Conclusion

Many times, development tools hide many low-level details from us. Understanding these details can help us understand the underlying principles, analyze and solve some difficult problems. Clang is an important tool on the Apple platform, and it is worth our research and exploration. Thank you for reading. If there is something wrong with the article, or if you have your own opinions, please leave a comment for discussion.

References

  • clang.llvm.org/docs/Module…
  • clang.llvm.org/docs/PCHInt…
  • bignerdranch.com/blog/it-loo…
  • nachbaur.com/2019/03/11/…
  • samsymons.com/blog/unders…