The complement of the original code? Plastic surgery? IEEE 754? Done in one article! –Storage of data in memory (related to group planning)

义艨ic：Personal homepage
Everyone may have started school or went to work again. I wish everyone all the best in study and work~

Let’s take a look at this blog~

Article directory

Foreword
Introduction to variable types
- Type classification
- Type size
Storage of integers in data
- Original code, inverse code, complement code
- Character type
- - Integer promotion
  - - - ?The significance of integer promotion
      - ?Rules for integer promotion
Floating point variable
- IEEE754
end

Foreword

We often use various variables when learning programming languages. So have you ever thought about how these different variables are stored in data? How do computers deal with these different types of variables? Let’s take C language as an example to show you a detailed understanding of different types of variables.

Introduction to variable types

First, let us list the common variable types in C language:

 char //Character data type

short //Short integer type

int //shaping

long //long integer type

long long //longer integer

float //single precision floating point number

double //double precision floating point number

Type classification

In fact, we can mainly divide the above types into integer family and floating point family.

Integer family

char
 unsigned char
signed char
short
unsigned short [int]
signed short [int]
int
 unsigned int
 signed int
long
 unsigned long [int]
 signed long [int]

Note: [] indicates the part that can be omitted. For example, in C language, typing long is actually the same as typing long int.

Floating point number family

 float
double
long double

Some people may be wondering, so what is the character data type? Here we have also planned it into plastic surgery. Why exactly? I will explain it to you next~

Of course, there are also constructed types, pointer types and in C language. Empty typesetc. will not be explained here.

Type size

Now let’s take a look at the size of the byte space occupied by the above types:

 printf("%d\\
", sizeof(char)); //1
printf("%d\\
", sizeof(short)); //2
printf("%d\\
", sizeof(int)); //4
printf("%d\\
", sizeof(long)); //4
printf("%d\\
", sizeof(long long)); //8
printf("%d\\
", sizeof(float)); //4
printf("%d\\
", sizeof(double)); //8
printf("%d\\
", sizeof(long double));//8

This is the variable size in the VS2022 environment. The variable size is the same on 64-bit and 32-bit machines. Of course, if you use another compiler, the sizes of long and long double may not be the same as mine. They may be 8 and 16 respectively. This is because the C language standard specification only defines the size of long as ≥ int, and the same is true for long double. , so different compilers may produce different results, but this is harmless.

Storage of integers in data

Now that we know integer variables, let’s see how they are stored in data.

Before talking about this issue, let us first understand some new concepts.

Original code, inverse code, complement code

There are three binary representation methods for integers in computers, namely original code, reverse code and complement code.

So, what are original code, reverse code, and complement code?

First of all, we need to know that the above three representation methods are composed of two parts: sign bit and numeric bit.

The sign bit uses 0 to represent “positive” and 1 to represent “negative”, while the content of the numeric bit is expressed in three different ways.

Original code

The original code can be obtained by directly translating the value into binary in the form of positive and negative numbers.

Reverse code

The one’s complement code can be obtained by keeping the sign bit of the original code unchanged and inverting the other bits bit by bit.

Complement code

The complement code + 1 is the complement code.

Of course, some books may write some formulas to express calculation methods, but the essence is the above content, and there is no need to memorize such lengthy formulas~

Based on the above definition, we can get the following flow-saving conclusion:

The original, inverse, and complement codes of positive numbers are the same.

There are three different ways of representing negative integers.

For shaping: the data stored in the memory actually stores the complement code.

But why?

In computer systems, numerical values are always represented and stored using two’s complement codes. The reason is that using complement codes, the sign bit and the numerical field can be processed uniformly;

What does it mean? Let’s take an example to put this into practice:

 int a = 10;
//00000000 00000000 00000000 00001010 original code
//00000000 00000000 00000000 00001010 reverse code
//00000000 00000000 00000000 00001010 complement
int b = -10;
//10000000 00000000 00000000 00001010 original code
//11111111 11111111 11111111 11110101 reverse code
//11111111 11111111 11111111 11110110 complement
int c = a + b;
//10000000 00000000 00000000 00010100 The value of c when adding the original codes
//00000000 00000000 00000000 00000000 The value of c when adding two's complements

As can be seen from the above code, if we use complement code to perform addition and subtraction, we can actually directly involve the sign bit in the operation. The answer obtained at this time is also correct, butOriginal code is obviously not possible, which is one of the reasons why complement code is used for storage.

At the same time, addition and subtraction can also be processed in a unified manner (CPU only has adder). In addition, the complement code and the original code are converted to each other, and the operation process is the same, without the need for additional hardware circuits.

What does this mean?

Let us add some knowledge. In fact, there is only an adder inside the CPU, and other operations are basically simulated through addition. At the same time, the process of converting the original code into the complement code and the complement code into the original code are actually the same. For example:

 int a = -10;
//10000000 00000000 00000000 00001010 original code
//11111111 11111111 11111111 11110101 reverse code
//11111111 11111111 11111111 11110110 complement

Among them, I turned the original code into a two’s complement code by inverting the sign bit bitwise and then adding 1. In fact, I inverted the sign bit bitwise and added 1 to the complement code. The original code can also be obtained. You can also do it yourself. Try it~

In this case, we do not need additional hardware circuits in the CPU to save resources.

Let’s verify the above statement ourselves in the compiler.

Of course, because we are using a little-endian machine, the content of the low-order bytes will be stored at a low address, so it may visually show the effect of backward storage.

Character type

We chose to classify character type variables into the integer family during classification. Now let us explain why.

First of all, we all know that the char type is essentially the ASCII code value of the stored character when stored. When the ASCII code is stored in a char variable, it is also stored as the complement code of the integer. This means that if you write the code like this:

 char c = 10;

Your compiler may not even issue a warning.

The size of char is only one byte. In the VS2022 environment, char defaults to signed char, which means that its maximum value is 127. When it comes to this, we need to introduce another concept – integer type promotion.

Integer promotion

So what is integer enhancement?

Integer arithmetic in C is always performed with at least the precision of the default integer type.

To achieve this precision, character and short operands in expressions are converted to ordinary integers before use, a conversion called integer promotion.

What does that mean? Simply put, when we use an integer variable that is less than 4 bytes, it will be integer-raised to 4 bytes and then operated. This refers to both characters and short integers.

?The significance of integer promotion

The integer operation of the expression must be executed in the corresponding computing device of the CPU. The operands of the integer arithmetic unit (ALU) in the CPU Byte length

Generally, it is the byte length of int, and it is also the length of CPU’s General Register.

Therefore, even if the addition of two char types is performed, it must first be converted to the standard length of the integer operand in the CPU when CPU is executed.

It is difficult for a general-purpose CPU to directly implement the direct addition of two 8-bit bytes (although there may be such byte addition instructions in machine instructions.

Therefore, various integer values in the expression that may have a length smaller than the length of int must be converted to int or unsigned int before they can be sent to the CPU to perform operations.

//Example
char a,b,c;
...
a = b + c;

The values of b and c are promoted to ordinary integers before addition is performed.

After the addition operation is completed, the result will be truncated and then stored in a.

?Rules for integer promotion

Integer promotion is promoted according to the sign bit of the data type of the variable.

//Plastic improvement of negative numbers
char c1 = -1;
There are only 8 bits in the binary bits (complement) of variable c1:
1111111
Because char is signed char
Therefore, when the integer is promoted, the high bit is supplemented with the sign bit, which is 1
The result after upgrading is:
11111111111111111111111111111111
    
//Plastic improvement of positive numbers
char c2 = 1;
There are only 8 bits in the binary bits (complement) of variable c2:
00000001
Because char is signed char
Therefore, when the integer is promoted, the high bit is supplemented with the sign bit, which is 0
The result after upgrading is:
00000000000000000000000000000001
    
//Unsigned integer promotion, high bits filled with 0

In short, for signed variables, positive numbers are preceded by 0, and negative numbers are preceded by 1. Unsigned variables can be filled with 0 directly.

If you want to verify the above statement, you can try putting some negative numbers or numbers greater than 127 in char.

Storage of integer variables

Floating point variable

After talking about the storage method of integer variables in data, let’s talk about floating point types. The storage method of floating point types is completely different from that of integers. If you don’t believe it, you can try it like this:

 int a = 10;
float b = 5.5f;
printf("%d\\
", b);
printf("%f\\
", a);

Then you will get the result like this:

Of course, after you read this article, you will understand the reason~

IEEE754

According to the international standard IEEE (Institute of Electrical and Electronics Engineering) 754, any binary floating point number V can be expressed in the following form:

(-1)^S * M * 2^E

(-1)^S represents the sign bit. When S=0, V is a positive number; when S=1, V is a negative number.

M represents a valid number, greater than or equal to 1 and less than 2.

2^E represents the exponent bit.

for example:

5.0 in decimal, written in binary is 101.0, which is equivalent to 1.01×2^2.

Then, according to the format of V above, we can get S=0, M=1.01, E=2.

-5.0 in decimal, written in binary is -101.0, which is equivalent to -1.01×2^2. Then, S=1, M=1.01, E=2.

IEEE 754 regulations:

For a 32-bit floating point number, the highest 1 bit is the sign bit S, the next 8 bits are the exponent E, and the remaining 23 bits are the significant digit M.

For a 64-bit floating point number, the highest 1 bit is the sign bit S, the next 11 bits are the exponent E, and the remaining 52 bits are the significant digit M.

IEEE 754 also has some special provisions for the significant digit M and the exponent E.

As mentioned before, 1≤M<2, that is to say, M can be written in the form of 1.xxxxxx, where xxxxxx represents the decimal part.

IEEE 754 stipulates that when M is stored inside the computer, the first digit of this number is always 1 by default, so it can be discarded and only the following xxxxxx part is saved. For example, when saving 1.01, only save 01, and when reading, add the first 1.

The purpose of this is to save 1 significant figure. Taking a 32-bit floating point number as an example, there are only 23 bits left for M. After the first 1 is rounded off, 24 significant digits can be saved.

As for the index E, the situation is more complicated.

First, E is an unsigned integer (unsigned int)

This means that if E is 8 bits, its value range is 0 ~ 255; if E is 11 bits, its value range is 0 ~ 2047. However, we know that E in scientific notation can be a negative number, so IEEE 754 stipulates that an intermediate number must be added to the real value of E when stored in memory. For an 8-bit E , the intermediate number is 127; for the 11-bit E, the intermediate number is 1023. For example, the E of 2^10 is 10, so when it is saved as a 32-bit floating point number, it must be saved as 10 + 127 = 137, which is 10001001.

According to this logic, you can verify for yourself why the example just given by the test has this result~

Storage of floating-point variables

End

Thanks for watching. If you like it, you can give me a big three consecutive thumbs~
Looking forward to our next meeting, if you like it, please click Follow before leaving! More content will be updated in the future!
Personal homepage: Portal