C language type conversion and unsigned output of negative numbers

1. Concepts related to type conversion
(1) Type conversion classification: automatic type conversion (referred to as “autorotation”) and mandatory type conversion (referred to as “forced conversion”).
(2) Type promotion: Before the compiler performs operations on the operands, it converts all operands into operand types with a larger value range.
(3) Purpose of type promotion: to avoid loss of data information. Because the high-level data type occupies a larger memory space than the low-level data type, the precision of the data type can be maintained.
(4) Integer promotion: both char and short types are automatically promoted to int before operation, and conversion to unsigned int can also be generated in C99 and C11.
(5) Type promotion rule: convert from low to high according to the operand type. As shown in the figure, the direction of the arrow indicates the conversion direction, the vertical arrow indicates the inevitable conversion, and the horizontal arrow only indicates the conversion direction but not the conversion process.

For example, if an operand of type int and an operand of type float perform arithmetic operations, the compiler will first convert the int type directly to the float type without going through the intermediate process of first converting the int type to unsigned int type and then converting to long type.
Special case: If one operand is of type long and the other is of type unsigned int, and the value of the operand of type unsigned int cannot be represented by type long, both operands are converted to type unsigned long.

2. Several situations of type conversion
(1) Automatic type conversion in assignment: If the type of the variable on the left (target side) of the assignment operator is inconsistent with the type of the expression on the right, the value of the expression on the right is automatically converted to the type of the variable on the left.
For example: int n = 2; float f = 3.5; double d = n + f, when the compiler calculates n + f, it will upgrade the type of variable n to the same float type as variable f, when n + f assigns a value to variable d , the operation result of float type will be automatically converted to double type and then assigned to d.
(2) Conversion of high data type to low data type: information may be lost.
For example: converting int to char will lose the upper 24 bits, converting float to int will lose the decimal part (in some cases, the precision of the integer part will also be lost), and converting double to float will lose the precision of the decimal part (rounding).
(3) Low data type conversion to high data type: change the data form without losing information.
(4) Conversion between unsigned data type (unsigned type) and signed data type (signed type)
① Both unsigned and signed can only define integer data (short, int, long) and character data (char type can be regarded as a special int type), so the conversion between unsigned type and signed type is only for integer data.
②Conversion of unsigned type to other types of data: the data content is not changed when converted to the same byte data type, the low byte content is intercepted when converted to the low byte data type, and the high byte is filled with 0 when converted to the high byte data type.
③Conversion of signed type to other types of data: When converting to the same byte data type, the data content will not be changed. When converting to low byte data type, the low byte content will be intercepted. 1, the non-negative number is filled with 0).
④ When unsigned type and signed type convert data types, the data change is only related to the byte size, and has nothing to do with the target data type.
Taking the short type and unsigned short type as examples, the codes are as follows.

/*short type*/
int main(int argc, const char *argv[])
{
    char a0 = 0;
    unsigned char a1 = 0;
    short b0 = 0;
    unsigned short b1 = 0;
    int c0 = 0;
    unsigned int c1 = 0;

    b0 = -1; /*Negative numbers are stored in two's complement in memory, the complement of -1 is 1111 1111 1111 1111*/
    b1 = (unsigned short)b0; /* When converting to unsigned short of the same byte data type, the data content will not be changed */
    a0 = (char)b0; /*Intercept the lower 8-bit content when converting to low-byte data type char*/
    a1 = (unsigned char)b0; /*Intercept the lower 8-bit content when converting to low-byte data type unsigned char*/
    c0 = (int)b0; /*When converting to high byte data type int, the negative number high bit complements 1*/
    c1 = (unsigned int)b0; /*When converting to high byte data type unsigned int, the negative number high bit complements 1*/

    printf("char:%d\
", a0);
    printf("unsigned char:%u\
", a1);
    printf("short:%d\
", b0);
    printf("unsigned short:%u\
", b1);
    printf("int:%d\
", c0);
    printf("unsigned int:%u\
", c1);

    return 0;
}



/*unsigned short type*/
int main(int argc, const char *argv[])
{
    char a0 = 0;
    unsigned char a1 = 0;
    short b0 = 0;
    unsigned short b1 = 0;
    int c0 = 0;
    unsigned int c1 = 0;

    b1 = 65535; /*The original code of the unsigned number 65535 is 1111 1111 1111 1111*/
    b0 = (short)b1; /*When converting to short with the same byte data type, the data content will not be changed*/
    a0 = (char)b0; /*Intercept the lower 8-bit content when converting to low-byte data type char*/
    a1 = (unsigned char)b0; /*Intercept the lower 8-bit content when converting to low-byte data type unsigned char*/
    c0 = (int)b0; /* When converting to high byte data type int, the high bit is filled with 0*/
    c1 = (unsigned int)b0; /*When converting to high byte data type unsigned int, the high bit is filled with 0*/

    printf("char:%d\
", a0);
    printf("unsigned char:%u\
", a1);
    printf("short:%d\
", b0);
    printf("unsigned short:%u\
", b1);
    printf("int:%d\
", c0);
    printf("unsigned int:%u\
", c1);

    return 0;
}

The two pieces of code run with the same result, as shown in the figure.

It can be seen that when the short type data b0 is -1, it is converted into a low-byte char type or unsigned char type, which only intercepts the low 8-bit data and converts it into a high-byte int type or unsigned int type. sign bit 1. The same is true for unsigned short, the low bit is intercepted, and the high bit is filled with 0.

3. Unsigned output of negative numbers
(1) Define char a0= -1, test the unsigned output content of a0, the code is as follows.

int main(int argc, const char *argv[])
{
    char a0 = -1;

    printf("a0-d:%d\
", a0);
    printf("a0-u:%u\
", a0);

    return 0;
}

The running result is shown in the figure.

Negative numbers are stored in the memory in the form of complement code. The binary data stored in the a0 variable is 1111 1111, which is the unsigned number 255. Therefore, the %u output of a0 should be 255 (that is, 2^8-1), but run The result is 4294967295 (that’s 2^32-1).
This wrong output is related to the printf function and the way of type conversion. When using printf to print, printf will allocate a buffer in the memory to store the printed content. This buffer stores data in units of 4 bytes, so when the value of the char variable a0 is passed into the printf function, it will Automatic type promotion from 1-byte data to 4-byte data. According to the rules of type promotion, when 1-byte data -1 is converted into 4-byte data, the sign bit 1 will be complemented, so the binary data stored in the printf cache is 1111 1111 1111 1111 1111 1111 1111 1111, use %d to print this paragraph The data is -1, and the data printed with %u is 4294967295 instead of 255.
(2) If you want to use %u to print the content of a0, you need to convert the type of a0 before printing, the code is as follows.

int main(int argc, const char *argv[])
{
    char a0 = -1;

    printf("a0-d:%d\
", a0);
    printf("a0-u:%u\
", (unsigned char)a0); /* input a0 in unsigned format*/

    return 0;
}

The running result is shown in the figure.

(3) Similarly, define short b0= -1, printf uses %d to print this data is -1, and uses %u to print this data is 4294967295 instead of 65535 (ie 2^16-1), if you want Printing the contents of b0 with %u also requires type conversion.
(4) In addition, under some compilers, unsigned char a1= -1 will not prompt warning or report an error, because negative numbers are stored in the form of complement code in memory, so it can be regarded as a1 variable storing unsigned data 255.

4. Other
(1) Regarding the method of reading and writing cache when the printf function prints float data, the code is as follows.

int main(int argc, const char *argv[])
{
    int num0 = 0;
    int num1 = 0;
    int num2 = 0;
    float dec0 = 0;

    num0 = 1;
    num1 = 2;
    num2 = 3;
    dec0 = 9.8;

    printf("float byte:%d, int byte:%d\
", sizeof(float), sizeof(int));
    printf("%f, %d, %d, %d\
", dec0, num0, num1, num2);
    printf("%d, %d, %d, %d\
", dec0, num0, num1, num2);
    printf("%d, %d, %d, %d\
", num0, dec0, num1, num2);

    return 0;
}

Run this code under the 32-bit compiler, the result is shown in the figure.

Run this code under the 64-bit compiler, the result is shown in the figure.

It can be seen that no matter whether the 32-bit compiler or the 64-bit compiler is used, the byte size of the float type and the int type is 4, and the output of the floating-point type using %d will print wrong data. However, under the 32-bit compiler, after the floating-point number dec0 is output incorrectly, it will affect the output of subsequent data such as integer num0, but it will not affect the subsequent output under the 64-bit compiler.
Guess: C language floating-point numbers are stored in memory in the form of exponent code + mantissa. Although the size of float type data is 4 bytes, printf puts the exponent code and mantissa of float type data in two 4-byte sizes. In the memory, that is, printf uses 8 bytes of memory to store float data. Under the 32-bit compiler, the system can read 4 bytes of data at a time. When outputting a floating-point number in the form of %f, the system will read twice, and then synthesize it into a decimal and print it. When outputting a floating-point number in the form of %d, the system will Only one reading will be performed, so the memory reading will be misplaced, which will affect the subsequent data output; under the 64-bit compiler, the system can read 8 bytes of data at a time, and when outputting floating-point numbers in the form of %f, the system will only perform one reading. For reading, when outputting floating-point numbers in the form of %d, the system will only read once, but since it can read 8 bytes of data at a time, it will not cause memory reading misalignment, nor will it affect subsequent output.
Since the content involved in the printf function is very complicated, there will be differences under different compilers, and the research on the printf function is not deep, so it is only for reference.