In-depth analysis of Linux kernel module loading (Part 1)

In-depth analysis of Linux kernel module loading (Part 1)

About the author

The author of the WeChat public account “Embedded Linux Development”, focusing on kernel, driver and system software development under embedded Linux, focusing on basic knowledge and actual project sharing.

insmod entry function

The busybox version used in this article is 1.34.1, and the Linux kernel version is 4.14.294

The insmod_main() function is the entry function of the insmod command. This function first obtains the name of the loaded module through the function parameter and stores it in the local pointer variable filename, and then calls bb_init_module() function for subsequent operations.

int insmod_main(int argc UNUSED_PARAM, char **argv)
{
char *filename;
int rc;

/* Compat note:
* 2.6 style insmod has no options and required filename
* (not module name - .ko can't be omitted).
* 2.4 style insmod can take module name without .o
* and performs module search in default directories
* or in $MODPATH.
*/

IF_FEATURE_2_4_MODULES(
getopt32(argv, INSMOD_OPTS INSMOD_ARGS);
argv += optind - 1;
);

filename = * + + argv;
if (!filename)
bb_show_usage();

rc = bb_init_module(filename, parse_cmdline_module_options(argv, /*quote_spaces:*/ 0));
if (rc)
bb_error_msg("can't insert '%s': %s", filename, modeerror(rc));

return rc;
}

Module parameter parsing function

The parse_cmdline_module_options() function will parse the parameters passed to the module when the module is loaded, and parse the parameters passed to the module one by one through the while loop, and the parsed parameter valuesval is stored in the memory space pointed to by the pointer variable options, and finally returns the first address of the memory space.

char* FAST_FUNC parse_cmdline_module_options(char **argv, int quote_spaces)
{
char *options;
int optlen;

options = xzalloc(1);
optlen = 0;
while (* + + argv) {
const char *fmt;
const char *var;
const char *val;

var = *argv;
options = xrealloc(options, optlen + 2 + strlen(var) + 2);
fmt = "%.*s%s";
val = strchrnul(var, '=');
if (quote_spaces) {
/*
* modprobe (module-init-tools version 3.11.1) compat:
* quote only value:
* var="val with spaces", not "var=val with spaces"
* (note: var *name* is not checked for spaces!)
*/
if (*val) { /* has var=val format. skip '=' */
val++;
if (strchr(val, ' '))
fmt = "%.*s"%s"";
}
}
optlen + = sprintf(options + optlen, fmt, (int)(val - var), var, val);
}
/* Remove trailing space. Disabled */
/* if (optlen != 0) options[optlen-1] = '\0'; */
return options;
}

Map module file

The bb_init_module() function first judges whether the module has parameters passed in, and calls the try_to_mmap_module() function to complete the subsequent mapping work. This function receives two parameters: the name of the loaded module ( filename), the size of the module (image_size) is passed in as an output parameter. Finally call the init_module() function, the init_module() function is a system call function, and the corresponding kernel function is the sys_init_module() function, which enters the kernel space. The parameters passed in are: the first address of the module memory space (image), the size of the module (image_size), and the first address of the module parameter memory space (options).

int FAST_FUNC bb_init_module(const char *filename, const char *options)
{
size_t image_size;
char *image;
int rc;
bool mmaped;

if (!options)
options = "";

//TODO: audit bb_init_module_24 to match error code convention
#if ENABLE_FEATURE_2_4_MODULES
if (get_linux_version_code() < KERNEL_VERSION(2,6,0))
return bb_init_module_24(filename, options);
#endif

/*
* First we try finit_module if available. Some kernels are configured
* to only allow loading of modules off of secure storage (like a read-
* only rootfs) which needs the finit_module call. If it fails, we fall
* back to normal module loading to support compressed modules.
*/
#ifdef __NR_finit_module
{
int fd = open(filename, O_RDONLY | O_CLOEXEC);
if (fd >= 0) {
rc = finit_module(fd, options, 0) != 0;
close(fd);
if (rc == 0)
return rc;
}
}
#endif

image_size = INT_MAX - 4095;
mmaped = 0;
image = try_to_mmap_module(filename, & image_size);
if (image) {
mmaped = 1;
} else {
errno = ENOMEM; /* may be changed by e.g. open errors below */
image = xmalloc_open_zipped_read_close(filename, & image_size);
if (!image)
return -errno;
}

errno = 0;
init_module(image, image_size, options);
rc = errno;
if (mmaped)
munmap(image, image_size);
else
free(image);
return rc;
}

The try_to_mmap_module() function first opens the module file to obtain the module file descriptor fd, and then obtains the detailed information of the module file through the fstat() function, and judges If the size of the module file st_size exceeds the set maximum file size, call the mmap_read() function to map the contents of the module file into the memory space in a read-only manner, and Return the first address of the memory space, check whether the module file is in ELF standard format by *(uint32_t*)image != SWAP_BE32(0x7f454C46), and finally set the first address of the memory space to image code> return. Through the try_to_mmap_module() function, we can obtain the address of the module file content in the memory space.

void* FAST_FUNC try_to_mmap_module(const char *filename, size_t *image_size_p)
{
/* We have user reports of failure to load 3MB module
* on a 16MB RAM machine. Apparently even a transient
* memory spike to 6MB during module load
* is too big for that system. */
void *image;
struct stat st;
int fd;

fd = xopen(filename, O_RDONLY);
fstat(fd, &st);
image = NULL;
/* st.st_size is off_t, we can't just pass it to mmap */
if (st.st_size <= *image_size_p) {
size_t image_size = st.st_size;
image = mmap_read(fd, image_size);
if (image == MAP_FAILED) {
image = NULL;
} else if (*(uint32_t*)image != SWAP_BE32(0x7f454C46)) {
/* No ELF signature. Compressed module? */
munmap(image, image_size);
image = NULL;
} else {
/* Success. Report the size */
*image_size_p = image_size;
}
}
close(fd);
return image;
}

The calling relationship from init_module() will enter the Linux kernel source code.

init_module() is actually a macro definition, which will eventually call the system call function corresponding to the __NR_init_module system call number is sys_init_module(), the corresponding The relationship is located in the Linux kernel source code include/uapi/asm-generic/unistd.h file. Regarding the knowledge of Linux system calls, I will write an article later to analyze the implementation mechanism of Linux system calls, and write a system call that the kernel does not have.

#define init_module(mod, len, opts) syscall(__NR_init_module, mod, len, opts)
#define __NR_init_module 105
__SYSCALL(__NR_init_module, sys_init_module)

The sys_init_module() function is formed by the expansion of the macro definition SYSCALL_DEFINE3, which is located in the file include/linux/syscalls.h

#define SYSCALL_DEFINE3(name, ...) SYSCALL_DEFINEx(3, _##name, __VA_ARGS__)
#define SYSCALL_DEFINEx(x, sname, ...) \
SYSCALL_METADATA(sname, x, __VA_ARGS__) \
__SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
#define __SYSCALL_DEFINEx(x, name, ...) \
asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \
__attribute__((alias(__stringify(SyS##name)))); \
static inline long SYSC##name(__MAP(x,__SC_DECL,__VA_ARGS__)); \
asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \
asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \
{ \
long ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__)); \
__MAP(x,__SC_TEST,__VA_ARGS__); \
__PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__)); \
return ret; \
} \
static inline long SYSC##name(__MAP(x,__SC_DECL,__VA_ARGS__))

The implementation of SYSCALL_DEFINE3 is located in the kernel/module.c file. This function first calls the may_init_module() function to determine whether the user has the authority to load the module. Call the copy_module_from_user() function to copy the content of the module file from the user space memory address to the kernel space memory address, the specific implementation will be analyzed later, and finally call the load_module() function, details See the analysis below for details.

SYSCALL_DEFINE3(init_module, void __user *, umod,
unsigned long, len, const char __user *, uargs)
{
int err;
struct load_info info = { };

err = may_init_module();
if (err)
return err;

pr_debug("init_module: umod=%p, len=%lu, uargs=%p\\
",
umod, len, uargs);

err = copy_module_from_user(umod, len, &info);
if (err)
return err;

return load_module( & amp; info, uargs, 0);
}

The copy_module_from_user() function first assigns the module size len to the load_info structure member info->len, and calls __vmalloc() function allocates info->len memory space for the module in the kernel space, and returns the starting address of the kernel memory space info->hdr, finally calling the copy_chunked_from_user() function is actually the copy_from_user() function to copy the content of the user space memory module file to info->hdr The kernel space memory address pointed to

static int copy_module_from_user(const void __user *umod, unsigned long len,
struct load_info *info)
{
int err;

info->len = len;
if (info->len < sizeof(*(info->hdr)))
return -ENOEXEC;

err = security_kernel_read_file(NULL, READING_MODULE);
if (err)
return err;

/* Suck in entire file: we'll want most of it. */
info->hdr = __vmalloc(info->len,
GFP_KERNEL | __GFP_NOWARN, PAGE_KERNEL);
if (!info->hdr)
return -ENOMEM;

if (copy_chunked_from_user(info->hdr, umod, info->len) != 0) {
vfree(info->hdr);
return -EFAULT;
}

return 0;
}

So far, the module files have been copied from user space to kernel space.

Module loading

In view of the complexity of the module loading function load_module(), due to space limitations, the specific loading process will be analyzed in the article “In-depth Analysis of Linux Kernel Module Loading (Part 2)”.

static int load_module(struct load_info *info, const char __user *uargs,
int flags)
{
struct module *mod;
long err;
char *after_dashes;

err = module_sig_check(info, flags);
if (err)
goto free_copy;

err = elf_header_check(info);
if (err)
goto free_copy;

/* Figure out module layout, and allocate all the memory. */
mod = layout_and_allocate(info, flags);
if (IS_ERR(mod)) {
err = PTR_ERR(mod);
goto free_copy;
}

audit_log_kern_module(mod->name);

/* Reserve our place in the list. */
err = add_unformed_module(mod);
if (err)
goto free_module;

#ifdef CONFIG_MODULE_SIG
mod->sig_ok = info->sig_ok;
if (!mod->sig_ok) {
pr_notice_once("%s: module verification failed: signature "
"and/or required key missing - tainting"
"kernel\\
", mod->name);
add_taint_module(mod, TAINT_UNSIGNED_MODULE, LOCKDEP_STILL_OK);
}
#endif

/* To avoid stressing percpu allocator, do this once we're unique. */
err = percpu_modalloc(mod, info);
if (err)
goto unlink_mod;

/* Now module is in final location, initialize linked lists, etc. */
err = module_unload_init(mod);
if (err)
goto unlink_mod;

init_param_lock(mod);

/* Now we've got everything in the final locations, we can
* find optional sections. */
err = find_module_sections(mod, info);
if (err)
goto free_unload;

err = check_module_license_and_versions(mod);
if (err)
goto free_unload;

/* Set up MODINFO_ATTR fields */
setup_modinfo(mod, info);

/* Fix up syms, so that st_value is a pointer to location. */
err = simplify_symbols(mod, info);
if (err < 0)
goto free_modinfo;

err = apply_relocations(mod, info);
if (err < 0)
goto free_modinfo;

err = post_relocation(mod, info);
if (err < 0)
goto free_modinfo;

flush_module_icache(mod);

/* Now copy in args */
mod->args = strndup_user(uargs, ~0UL >> 1);
if (IS_ERR(mod->args)) {
err = PTR_ERR(mod->args);
goto free_arch_cleanup;
}

dynamic_debug_setup(mod, info->debug, info->num_debug);

/* Ftrace init must be called in the MODULE_STATE_UNFORMED state */
ftrace_module_init(mod);

/* Finally it's fully formed, ready to start executing. */
err = complete_formation(mod, info);
if (err)
goto ddebug_cleanup;

err = prepare_coming_module(mod);
if (err)
goto bug_cleanup;

/* Module is ready to execute: parsing args may do that. */
after_dashes = parse_args(mod->name, mod->args, mod->kp, mod->num_kp,
-32768, 32767, mod,
unknown_module_param_cb);
if (IS_ERR(after_dashes)) {
err = PTR_ERR(after_dashes);
goto coming_cleanup;
} else if (after_dashes) {
pr_warn("%s: parameters '%s' after `--' ignored\\
",
mod->name, after_dashes);
}

/* Link in to sysfs. */
err = mod_sysfs_setup(mod, info, mod->kp, mod->num_kp);
if (err < 0)
goto coming_cleanup;

if (is_livepatch_module(mod)) {
err = copy_module_elf(mod, info);
if (err < 0)
goto sysfs_cleanup;
}

/* Get rid of temporary copy. */
free_copy(info);

/* Done! */
trace_module_load(mod);

return do_init_module(mod);

 sysfs_cleanup:
mod_sysfs_teardown(mod);
 coming_cleanup:
mod->state = MODULE_STATE_GOING;
destroy_params(mod->kp, mod->num_kp);
blocking_notifier_call_chain( & module_notify_list,
MODULE_STATE_GOING, mod);
klp_module_going(mod);
 bug_cleanup:
mod->state = MODULE_STATE_GOING;
/* module_bug_cleanup needs module_mutex protection */
mutex_lock( & module_mutex);
module_bug_cleanup(mod);
mutex_unlock( & module_mutex);

/* we can't deallocate the module until we clear memory protection */
module_disable_ro(mod);
module_disable_nx(mod);

 ddebug_cleanup:
dynamic_debug_remove(mod, info->debug);
synchronize_sched();
kfree(mod->args);
 free_arch_cleanup:
module_arch_cleanup(mod);
 free_modinfo:
free_modinfo(mod);
 free_unload:
module_unload_free(mod);
 unlink_mod:
mutex_lock( & module_mutex);
/* Unlink carefully: kallsyms could be walking list. */
list_del_rcu( &mod->list);
mod_tree_remove(mod);
wake_up_all( & module_wq);
/* Wait for RCU-sched synchronizing before releasing mod->list. */
synchronize_sched();
mutex_unlock( & module_mutex);
 free_module:
/*
* Ftrace needs to clean up what it initialized.
* This does nothing if ftrace_module_init() wasn't called,
* but it must be called outside of module_mutex.
*/
ftrace_release_mod(mod);
/* Free lock-classes; relies on the preceding sync_rcu() */
lockdep_free_key_range(mod->core_layout.base, mod->core_layout.size);

module_deallocate(mod, info);
 free_copy:
free_copy(info);
return err;
}

To be continued. . .

Summary

The previous article mainly introduced the process of copying module files from user space to kernel space, starting from the busybox source code to the Linux kernel source code. Pay attention to “Embedded Linux Development” and continue to update more knowledge about embedded Linux development.