Linux – Environment variables – General commands and built-in commands

Recommended book, “In-depth Understanding of the Linux Kernel”. Jiumo Book Search | Internet-wide e-book search engine, a must-have for novelists | Tbox Navigation (tboxn.com)

Register

You should know that the code is loaded into the memory so that the CPU can perform operations. So, when we write the return value of the function, or how do we get the return value of this program?

In fact, it is very simple, it just relies on the registers in the CPU. There are many registers in the CPU. Depending on the compiler used, the various methods and functions and the registers used are different.

When a function returns a value, the move instruction is first used to store the return value of the function into a register (eax is a register. Of course, this is just an example, and different compilers may use different registers):

return -> mov eax 10

Then, if there is a corresponding variable outside the function to receive the return value of the function, the move instruction will be used again to copy the data in the register to the corresponding variable:

int a = add(a,b);
mov eax -> a

Therefore, when your function returns a lot of data, for example, it returns a very large structure object, then we generally do not use return by value, because returning by value will occupy multiple registers because of the size of the registers. Not big. We usually open up space on the heap, because the space on the heap will not be destroyed with the destruction of the function stack frame (destruction of the stack). When returning, it does not return the size of the space on the heap, but directly returns the heap. The first address of the space.

How does the system know which line of code the current process is running on?

The CPU also uses registers to record the line of code currently executed.

There is a register in the CPU called Program Counter. In many textbooks, they like to call it PC pointer, eip. To put it bluntly, this is a counter that records the address of the next line of instructions to the instruction being executed by the current process.

For example,The current CPU is executing 50 lines of code, then the program counter records the address of line 51.

Therefore, when we write high-level languages, we will encounter sequential statements, logical statements, and loop statements. That is to say, the algorithm used by the program counter to calculate the number of statements to be executed in the next line is different.

For example, the sequence statement is to increase the order downwards; the loop statement is to execute each line of code statement in the loop body. When the last line of the loop body is executed, the first line of the loop body is repeated for execution.

There are many registers in the CPU, such as:

General registers: eax, ebx, ecx, edx… This kind of register can be used as long as someone wants to use it.
Stack frame: ebp, esp, eip
Status register: staus… can be used to implement process scheduling algorithms.
··························

So what role does the register play?

First of all, registers also have the ability to temporarily save data. When the computer is running, some important data must be saved to the CPU. Because it is placed inside the CPU, the data will be closer to the CPU, and the register can access data more efficiently.

Therefore, in order to improve efficiency, the CPU will put data with high processing frequency (high-frequency data of the process) into registers. In other words, the registers in the CPU store process-related data, and this data can be accessed and modified by the CPU at any time.

Moreover,this part of the data is temporary data of the process – the context of the process.

When a process is about to leave, its process context data must be saved or even taken away. The purpose of saving and taking away is that when the process is switched back to the CPU for execution, the process can be restored to its previous running state, that is, to which step it was run last time. Let the cpu know how to run this process. (This process is called — Operation to save process context)

If you only save the temporary data of the process in the CPU, when a new process wants to enter, you just simply copy the data and code of the process to the CPU, then the data of the old process in the CPU will not be Is it covered?

Therefore, when the process is switched again, it will do two things (This process is called process switching operation):

save context
restore context

The above Restore process context operation can be understood as putting all the process data that the process has saved back into the CPU. The CPU continues to execute the process according to these context data.

Therefore, if it is a process that takes a long time to execute and cannot always run on the CPU, when the execution time of the process in the current time slice is up, a process switch must be performed. Because this process takes a long time to execute, this process is destined to be subject to high-frequency process switching. Then the process may be interrupted at any time during running and process switching is performed. During the switching period, the process context will be packaged, saved and taken away, and the context restored.

Where is the context of the process stored?

The data that can be stored in the register is definitely not much, so the temporary data of the process is not saved. Therefore, combining this situation with the above example, then, when performing process switching, the context data of the process cannot be Saved in the CPU’s registers.

So, where is the context of the process stored? In the PCB object of the process?

You may think: define a structure object specifically used to store the context file of the current process, and put this structure object into the PCB.

In fact, this approach is not correct because it is too slow.

The CPU has its own hardware method when saving the context of a process:

x86 protected mode – detailed explanation of global descriptor table GDT_gdt global descriptor table function-CSDN Blog

Lesson 14 Use of local segment descriptors-CSDN Blog

Wait, about the operating system hardware.

Environment variables

Introduction to environment variables

Environment variables are a set of variables in the form of name = value (key-value pairs) provided by the system. Different environment variables have different users and usually have global properties.

Environment variables generally refer to some parameters used in the operating system to specify the operating environment of the operating system.

PATH environment variable

When configuring the Java environment, we may have encountered the problem of configuring environment variables.

When we write C/C++ code and link, we never know where our linked dynamic and static libraries are, but
The link can still be successful and the executable program can be generated. The reason is that there are relevant environment variables to help the compiler find it.

Just like the header file you introduced with #include<> at the beginning, where the compiler finds the location of the header file depends on the path to the header file stored in the environment variable.

Also, in Linux, you may have questions. The commands in the system are essentially software one by one. To execute this command, we use bash to help us parse the command and then run the program of this command.

But why do we enter the command and directly enter the name of the command to run it? However, when we run the program we wrote, we still need to bring an absolute directory or a relative directory?

In fact, the commands in these systems are in the /usr/bin directory. This directory is the system’s default command search path. Therefore, there is no need to bring absolute paths and relative paths to let the operating system find this file.

However, the executable file we wrote ourselves is stored in the working directory and needs to bring the path.

The reason for the above is that Linux will help us configure an environment variable — PATH when searching for instructions. This environment variable is automatically configured, and this PATH contains some paths:

He uses “:” as the separator to define multiple paths:

Each path above is the default path where the system searches for the executable file of the instruction when executing the instruction. So, when executing, the path will no longer be brought.

We can use the env command to view all environment variables in the system:

You can also use the getenv() function to obtain the environment variable represented by the passed in string:

As in the example, use a C program to print the contents of the PATH environment variable:

Output PATH contents:

Because the content of the PATH environment variable is obtained, different users will get different results when calling this function. For example, log in to the root account as follows to run this program:

In this case,we can determine which user is currently in the code (using the returned string to control), and different users can implement different functions:

In addition to using the getenv() function, you can also use putenv() to concatenate environment variables into a string and then replace the original environment variable with it:

C language putenv() function: used to change or add the content of environment variables – C Language Network (dotcpp.com)

Add and delete paths in PATH

PATH=/xxx/xxx

Using the above method, you can directly overwrite the path in PATH with the /xxx/xxx path we entered above.

PATH=$PATH:/xxx/xxx

The above method is to add a path based on the original PATH.

As shown in the following example, a path is appended:

At this time, the text executable file in the directory we just configured can be executed without a path:

Note:PATH is a memory-level environment variable that is created when you start the shell. If you shut down the computer or exit the shell, this environment variable will be gone. When the program is started again, this environment variable actually has a configuration file in the system, and PATH will be configured according to the configuration file at startup.

As the above is run in Xshell, there is no need to restart the Linux operating system, just restart Xshell directly.

HOME environment variable

If you log in as the root user, the path you see using pwd and echHOME is as follows:

If you log in as an ordinary user, you will see the following path:

In other words, if the root user just logs in, it will be in the .home directory by default; and if an ordinary user just logs in, it will be in their own home directory by default.

When you use the shell to log in to your account, the shell will identify which user you are currently logged in, and then fill in the HOME environment variable. When the login is successful, it will help you put it in the HOME path of this user by default because of the existence of HOME. , the shell assigns us the command line interpreter (bash) according to HOME.

SHELL environment variable

What this environment variable stores is which shell we are currently using and the corresponding executable program:

How environment variables are organized

The environment variables are organized in a char* [] pointer array that stores many strings managed by an environ pointer:

Command line parameters

The main function has parameters, as shown below:

int main(int argc, char* argv[])
{

    return 0;
}

argv is a string array, and the previous argc indicates what the argv string array stores:

We found that this stores the commands and options we used to run this executable program. Each option is separated by spaces:

The main() function is not the first function to be called. The main function is called by the Statup() function. The main function is also the function that is called. So he can receive our incoming commands. In fact, essentially, the command we pass in (the input command) is actually a string. For example, ./text -a -b essentially inputs the string “./text -a -b”. Then the bash command line interpreter helps us separate (interpret) this command according to the way of space separation.

However, there are different options after the command we use. This option can be accessed by using the argv string array in the main function using subscripts. It’s the same as the above example.

In fact, when we use commands, it seems that there are many options to achieve different functions. In essence, it is to use different strings passed in, judge these strings respectively, and then implement different functions:

Example output:

Therefore, in order to support our own needs in the command line and input different parameters, we can pass in parameters in the main function. This method provides support for command line options for instructions, tools, software, etc.

env parameter in main (understand the global properties of environment variables)

There are two core vector tables in the main function, one is command line parameter table and the other is environment variable table.

In addition to the two parameters mentioned above, the main() function also has parameters, env string array:

int main(int argc, char* argv[], char* env[])
{

    return 0;
}

The structure of this env string array is the same as argv. The separation method is the same, using spaces to separate environment variables one by one.

First, we still use a loop to print this environment variable table:

Output:

The above screenshot is not complete. In fact, the output here is similar to the external use of the env command. Printed the system’s loop variables.

So, after understanding the core vector tables of the above two main functions, in fact, when we write an executable program and execute the executable program, we do not simply load the program into the memory, and then Run;

Rather, there must be a function that calls the main() function and passes the two core vector tables of the main() function into the main() function.

The environment variable information printed above is the same as the env command, so why can the text program get the environment variable information?

Don’t forget, text is a sub-process created by the bash command line interpreter, and the environment variables printed above are not available after running the text executable program, but when we open our shell, that is, after opening the terminal, These environment variables are created, so these environments come from bash.

Therefore, we have a conclusion: The child process can inherit all the environment variable information of the parent process.

However, the processes we run are all child processes. When bash itself is started, it will read the environment variable information from the operating system’s configuration file, and bash will create its own in its own context. Environment variable table. If you need, create a child process, and bash can give this environment variable table to the created child process. The child process will inherit all environment variable information given by the parent process.

So this means that from the time bash is running, the various commands we write on the terminal and the various processes we execute are all sub-processes created by bash and will recognize these environment variables.

Because bash is the command line interpreter, it interprets and then creates bash child processes. The relationship between these processes, whether it is a parent-child relationship or between sibling processes, is the environment in the bash process context. variable information.

So,this is why environment variables have global properties.

Suppose, if you want to formulate some new rules now, you can use environment variables to formulate rules. In this case, subsequent processes that only use commands will follow the rules specified by this environment variable. .

If you don’t want to pass in parameters in the main() function, but also want to obtain all the environment variables; you can use the third-party variable environ defined in the unistd.h header file in C language to obtain: strong>

#include <stdio.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
    // Declare this variable
    extern char **environ;
    int i = 0;

    for(; environ[i]; i + + ){
        printf("%s\\
", environ[i]);
    }

    return 0;
}

Create/cancel an environment variable

You can create an environment variable using the export command, as shown in the following example:

As mentioned above, use env to also check the existence of newly created environment variables.

Then, the newly created environment variables mentioned above are currently added to the bash context.

At this point, in the following executable file, we can print out the environment variables we added above in the main() function:

Output:

We can use the unset command to delete an environment variable.

The MY_VALUE environment variable we created above no longer exists after being deleted using unset:

At this time, env cannot find the MY_VALUE environment variable.

Local variables (shell variables) and regular commands, built-in commands

When we directly enter an all-capital variable name on the terminal (This variable name does not exist) and then assign a value, can we create a new environment variable? The answer is No.

As shown above, we use env and grep to filter and cannot see the Y_VALUE information. Does this mean that the MY_VALUE variable does not exist?

This MY_VALUE exists again. We can use echo to view the information of this variable:

In fact, MY_VALEU here is not an environment variable, it is a local variable.

You can use the set command to view all variables of the current process (including environment variables and local variables):

Local variables are not inherited by child processes and are only valid in this BASH.

For example, there are some self-defined local variables in BASH:

So, now there is a problem. As we said above, we are running BASH sub-processes in the command line, so the sub-processes cannot get the local variables in BASH.

However,when we printed the value of the local variable above, we used the echo command to view the information of the local variable; isn’t this echo also a subprocess created by BASH? Why can echo get the information about local variables in BASH?

In fact, not all commands require bash to create child processes.

The commands in LINUX are actually divided into two batches of commands:

General commands — completed by creating a child process. (This is a command we use a lot)
Built-in command — a command that is executed by bash itself without creating a child process. Similar to bash calling functions implemented by itself or provided by the system.

The above-mentioned echo command is a built-in command, so it can be executed by itself in bash without creating a child process. Therefore, you can directly access the information of local variables stored in the bash context.

In fact, the cd command is also a built-in command. Cd moves the directory, which essentially changes the execution directory of bash. Therefore, if cd is only a child process, it cannot change to the working directory of the parent process bash. It can only be that cd is a built-in command, and the working directory it changes is actually the same working directory as bash; what is changed is actually the local variable in bash that saves the working directory.

There is a function chdir(). This function can help you modify the path:

Whoever calls this interface will change his or her working path to the path corresponding to the string passed in by the chdir () function.

Like the above example, you can achieve similar effects such as: ./text / modifying the working path to a new directory.

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge. Cloud native entry-level skills treeHomepageOverview 16822 people are learning the system