Analyze the storage of data in memory and improve the shaping of unsigned numbers

Table of Contents

Preface

1. Storage of plastic data in memory

1. char type

2. int type

2. Floating point type

Detailed explanation of storage format

Access float variables as int*

Access int variables as float*

Summarize

Foreword

In the fields of computer science and data processing, understanding how data is stored in memory is critical to understanding how computer systems work. How data is stored directly affects its representation range, accuracy, and utilization of computer resources. In addition, it is also crucial to understand the integer promotion of unsigned numbers because it involves the effective representation range of data and the impact on calculation results. In this blog, we will take a closer look at how data is stored in memory, focusing on the storage formats of integer data (such as char and int) and floating point data (such as float). We’ll discuss in detail how each type is represented in memory and explore the impact of unsigned numbers on integer promotion. In addition, we will compare the differences in binary encoding of integer and floating-point data, as well as experimental verification of using pointers to access between different types.

Hopefully this blog post will help readers gain a deeper understanding of how data is stored in memory and gain a clearer understanding of integer enhancements with and without unsigned numbers.

1. Storage of plastic data in memory

1. char type

First, let’s look at a piece of code that prints the result of numbers stored in unsigned char type:

void test1()
{
char a = -1;
signed char b = -1;
unsigned char c = -1;
printf("a=%d,b=%d,c=%d\
", a, b, c);
}

operation result:

(Note: The running environment of the above code is x64 of VS2022 to obtain this running result)

Since the C language standard does not stipulate that the char type must be signed or unsigned, most compilers still consider the char type to be a signed type, that is, signed char = char.

Why is the result of the variable c above 255 instead of -1? This involves the different ranges of data that can be stored between unsigned and signed types. The valid boundary values allowed for storage in memory are different. Follow the following storage rules:

It can be seen that the value range of unsigned char is 0~255, while the value range of signed char is -128~127, so when assigning a value of -1 to an unsigned char type variable, it is equivalent to walking one unit counterclockwise from 0, and the value is 255.

Here’s another case:

void test7()
{
unsigned char i = 0;
for (i = 0; i <= 255; i + + )
{
printf("hello world\
");
Sleep(500);
}
}

operation result:

The program crashes because it executes an infinite loop. The value of an unsigned type variable will never be less than 0, so the loop condition is always met.

Read on:

void test6()
{
char a[1000];
int i;
for (i = 0; i < 1000; i + + )
{
a[i] = -1 - i;
}
printf("%d\
", strlen(a));
}

First try to judge the operation logic of the above code, and then think about the output results. The results are as follows:

Why is the result 255? We must first clarify the implementation logic of the strlen() function. It will only stop when the character array is traversed to ‘\0’. Looking back at the loop part of the above code, the value of a[i] starts from -1 and goes counterclockwise along the signed char virtual ring. Transfer assignment, when a[i] reaches -128, the value of 127 will be passed to a[i + 1] in the next loop, and then decrease to 0 and then repeat the whole process until the number of loop executions reaches 1000 times.

Continue with a piece of code and analyze its results:

void test3()
{
//char:-128~127
char a = 128;
    printf("%d\
", a);
printf("%u\
", a);
}

(Note: %u in the above code means outputting the value of a in unsigned decimal type)

Don’t feel that the char type can only hold 127 at most. The value assigned here exceeds the allowed range, but as you wish, it is a little crowded and does not affect the process of assigning values to variables.

operation result:

What is the specific number below? Use a calculator to convert it:

Let’s see why such a large value appears after conversion:

//char :-128~127
char a = 128;
// 00000000000000000000000010000000 original (inverse, complement) code
// 10000000 truncate
// Signed integer promotion
// 11111111111111111111111110000000 signed complement
// 11111111111111111111111101111111 Signed one's complement
// 10000000000000000000000010000000 signed original code (output result)
// Unsigned integer promotion
// 00000000000000000000000010000000 unsigned complement
// 00000000000000000000000001111111 unsigned one's complement
// 11111111111111111111111110000000 Unsigned original code (output result)

Why is there truncation? Why is the output result different if there are unsigned numbers? It involves the knowledge of truncation storage and integer promotion respectively. It can be imagined that the char type or the short type actually belongs to the integer, so the way they are stored in memory is the same as the int type. are consistent, except that they can only occupy 1 and 2 bytes respectively, the corresponding binary bits are 8 and 16 bits, and the int type occupies 32 bits. When the value 128’s complement is assigned to a char type element, only the last 8 bits can be retained, that is, 10000000. When it is necessary to output signed or unsigned numbers, integer promotion will be involved, and unsigned and unsigned numbers have their own integer promotion rules, resulting in different output results for signed and unsigned numbers.

At the same time, the following points should be noted in plastic surgery and lifting methods:

  • The improved binary code is two’s complement, not the original code.
  • When there is a signed positive number (the decimal number is a positive number before promotion) and an unsigned promotion (the variable type is an unsigned type after receiving promotion), all bits in front of the complement are filled with 0;
  • Signed negative numbers (the decimal number before promotion is negative), all bits before the complement are complemented by 1

2. int type

After accumulating the knowledge explained above about the char type, this part will naturally become handy. Let’s look at a piece of code:

void test4()
{
int i = -20;
// 10000000000000000000000000010100 original code
// 111111111111111111111111111101011 reverse code
// 111111111111111111111111111101100 complement
unsigned int j = 10;
// 00000000000000000000000000001010 original (inverse, complement) code
printf("%d\
", i + j);
// Two's complement sum
// 11111111111111111111111111110110 complement
// 111111111111111111111111111110101 reverse code
// 10000000000000000000000000001010 original code (printout)
}

The code comments directly give the compiler’s logic rules for the addition operation of integers and unsigned integers, that is, first convert it to two’s complement code at the same time during the addition operation, and then convert the obtained complement code to the original code when printing the output. The running result as follows:

Next look at a piece of code:

void test5()
{
unsigned int i;
for (i = 9; i >= 0; i--)
{
Sleep(500); //Set the output delay to facilitate observation of the results
printf("%u\
", i);
}
}

operation result:

Why does the value of i equal to 0, and then the value of i becomes very large after decrementing once? Yes, because it is an unsigned integer, there is no negative situation. Just as the range of unsigned char type values drawn above is always greater than 0, we can also make a ring chart of (signed) int and unsigned int type values.

In this way, you will have a deeper understanding of the results of the above code.

2. Floating point number type

First, speculate similarly on the results of the following code:

int n = 9;
float* pFloat = (float*) &n;
printf("The value of n is: %d\
", n);
printf("The value of *pFloat is: %f\
", *pFloat);
*pFloat = 9.0;
printf("The value of num is: %d\
", n);
printf("The value of *pFloat is: %f\
", *pFloat);

operation result:

We see that the output in lines 1 and 4 is ideal, but unexpected results appear in lines 2 and 3. The reason is that the storage formats of integer and floating point data in memory are different. It is considered that each different high bit has a different weight. I will not go into details here. The storage format of floating point type is different from that of integer, because the floating point number type also involves the storage of the decimal part. Since the float type and int type only occupy 4 Bytes, so the floating point type uses a method similar to scientific notation to store data. The specific storage format is as follows:

According to the international standard IEEE (Institute of Electrical and Electronics Engineering) 754, any binary floating point number V can be expressed in the following form:

  • (-1)^S * M * 2^E
  • (-1)^S represents the sign bit. When S=0, V is a positive number; when S=1, V is a negative number.
  • M represents a valid number, greater than or equal to 1 and less than 2.
  • 2^E represents the exponent bit.

For example:
5.0 in decimal is 101.0 in binary, which is equivalent to 1.01×2^2.
Then, according to the format of V above, we can get S=0, M=1.01, E=2.
-5.0 in decimal is -101.0 written in binary, which is equivalent to -1.01×2^2. Then, S=1, M=1.01, E=2.

Therefore, the specific storage process is to convert decimal to binary, and then rewrite it into scientific notation expressed as a power of 2.

Detailed explanation of storage format

IEEE 754 stipulates:

1. For a 32-bit floating point number, the highest 1 bit is the sign bit S, the next 8 bits are the exponent E, and the remaining 23 bits are the significant digit M

2. For a 64-bit floating point number, the highest 1 bit is the sign bit S, the next 11 bits are the exponent E, and the remaining 52 bits are the significant digit M

Examples of memory allocation and occupation by different bits are as follows:

32-bit floating point number

64-bit floating point number

Because scientific notation allows the exponent to be a negative number, and the encoding in E here is an unsigned number, the specific stored exponent value E is based on the actual exponent value E’ plus 127 <32-bit floating point number> (or 1023 <64-bit Floating point number >), represented as:

32-bit:E = E' + 127

64-bit:E = E' + 1023

For example, when we are about to initialize float a = 5.5;, the actual binary code stored is as follows:

float a = 5.5;
// 101.1
// (-1)^0 * 1.011 * 2^2
// S = 0 (1)
// M = 1.011 (23) M -> .011 only the decimal part is stored
// E = 2 (8) E + 127 = 129
//The storage order is S->E->M
// 0 10000001 01100000000000000000000
// 0100 0000 1011 0000 0000 0000 0000 0000 Binary 4-spaced equal divisions are directly converted into one hexadecimal number
// 40 b0 00 00

It should be noted that M only stores the decimal part because the value range of M is [1,2), so when we describe the binary encoding of floating point numbers stored in memory, we must pay attention to the 23 bits of the M part <32 A bit floating point number> (or a 52-bit <64-bit floating point number>) only stores the decimal part of the scientific notation base.

Through the above analysis, we have clearly understood the specific method of storing floating point numbers into memory, and then open the running memory window to observe whether the stored value of variable a in the above code is consistent with expectations:

You can see that the value in the green window is 00 00 b0 40, which is consistent with the speculation in the comment part of the above code. The order is reversed just because the machine is a little-endian machine, the low bits are stored in low addresses, and the high bits are stored in high addresses, that’s all.

Access float variables as int*

If we read the binary encoding of the float type variable a by reading the int type, it will be a very large value, which can be read and printed in the program:

code show as below:

float a = 5.5;
// 0 10000001 01100000000000000000000
// 0100 0000 1011 0000 0000 0000 0000 0000 Binary 4-spaced equal divisions are directly converted into one hexadecimal number
// 40 b0 00 00
int* n = (int*) & amp;a;
printf("%d\
", *n);

operation result:

Debug window:

Through the above debugging, it is not difficult to find that when we use the integer pointer to point to the address of the float type variable and dereference the access, the number obtained is the binary number of the float type variable directly read in the integer binary mode, so the operation result appears: 1085276160

After the calculator converts the result, it does correspond to the hexadecimal value of the memory window.

Use float* to access int variables

Execute the following code:

int num = 999999999;
float* f_p = (float*) & amp;num;
printf("%f\
", *f_p);

operation result:

Debug window:

When we use the float type pointer to point to the int type variable address and dereference the access, the number obtained is the binary number of the int type variable split into float type binary storage mode (S->E->M) for reading, so the operation occurs Result: 0.004724

Calculator converts integers to binary encoding:

The following is the binary analysis part of reading an integer in float binary:

int num = 999999999;
// 0011 1011 1001 1010 1100 1001 1111 1111
// 3b 9a c9 ff
float* f_p = (float*) & amp;num;
// 0 01110111 00110101100100111111111 float binary
// S E M
// Treat the integer binary as a stored floating-point binary. When reading, E' needs to subtract 127 (or 1023)
// 0 119 (E'=-8) 1755647 (M'= 1.1755647) float scientific notation decimal
// (-1)^0 * 1.1755647 * 2^(-8)
// 1.1755647/256 = 0.004592

Because the number is too small, errors are allowed in machine operations, which is roughly the same as the result 0.004724 given by the console.

E contains both 1 and 0

In addition, in the above example, we have both 0 and 1 digits in the representation part of E, so we can directly subtract 127 (or 1023) to get the E’ value for direct calculation in scientific notation. The M value needs to be preceded by the first 1, because as we mentioned earlier, when storing float type numbers, the significant digits only store numbers after the decimal point.

E is all 0

At this time, the exponent E of the floating point number is equal to 1-127 (or 1-1023), which is the real value.
The effective digit M is no longer added to the first digit of 1, but restored to a decimal of 0.xxxxxx. This is done to represent ±0, and very small numbers close to 0.

E is all 1

At this time, if the significant digits M are all 0, it means ±infinity (the sign bit depends on the sign bit s);

Summary

In this blog, we take a deep dive into how data is stored in memory, focusing on the formats in which integer data (such as char and int) and floating point data (such as float) are stored. We looked at how each type is represented in memory and discussed in detail the impact of unsigned and unsigned numbers on integer promotion. By comparing the storage formats of integer data and floating-point data, we found that they have differences in binary encoding. We also conducted experiments using pointers to access between different types to verify the differences in storage formats.

Through this blog, I hope readers can better understand how data is stored in memory, and have a clearer understanding of the integer improvement of unsigned numbers. A deep understanding of these concepts is important in programming and data processing and can help us better understand how computer systems work and write more efficient and accurate code. In future programming and data processing, we should pay attention to how data is stored, especially when performing type conversions and data operations. Understanding data storage formats and conversion rules between types will help us write more robust and reliable code.