C language | How floating point numbers are stored in memory

Floating point numbers are actually stored in binary form in memory, but they are not stored in the form of original code, complemented code, or complemented code.

Common floating point numbers:

3.14159

1E10【Scientific notation 1.0*10^10】

eg: 1.23=12.3*10^-1=0.123*10^1

The floating point number family includes: float, double, long double types

Floating point number representation range: defined in float.h

If you are interested, you can actually find these files:

Floating point number storage rules

Example of floating point number storage:

Example 1:

#include <stdio.h>
int main()
{
int n = 9;
float* pFloat = (float*) &n;
printf("The value of n is: %d\
", n);
printf("The value of *pFloat is: %f\
", *pFloat);
*pFloat = 9.0;
printf("The value of num is: %d\
", n);
printf("The value of *pFloat is: %f\
", *pFloat);
return 0;
}

The storage methods of integers and floating point types are different, so the data must be stored in the same type regardless of the type. And the printing type must match the printing method, otherwise the running results will be unpredictable.

According to the international standard IEEE (Institute of Electrical and Electronics Engineering) 754, any binary floating point number V can be expressed in the following form:

  • (-1)^S*M*2^E
  • (-1)^S represents the sign bit. When S=0, V is a positive number; when S=1, V is a negative number;
  • M represents a significant number, greater than or equal to 1, less than or equal to 2;
  • 2^E represents the exponent bit.

Example 2:

Decimal: 5.5

Binary: 101.1

  • (-1)^S*M*2^E
  • (-1)^S represents the sign bit. When S=0, V is a positive number; when S=1, V is a negative number;
  • M represents a significant number, greater than or equal to 1, less than or equal to 2;
  • 2^E represents the exponent bit.

Scientific notation:(-1)^0*1.011*2^2

Example 3:

Decimal: 9.0

Binary: 1001.0

  • (-1)^S*M*2^E
  • (-1)^S represents the sign bit. When S=0, V is a positive number; when S=1, V is a negative number;
  • M represents a significant number, greater than or equal to 1, less than or equal to 2;
  • 2^E represents the exponent bit.

Scientific notation:(-1)^0*1.001*2^3

IEE754 regulations:

For a 32-bit floating point number, the highest 1 bit is the sign bit S, the next 8 bits are the exponent E, and the remaining 32 bits are the significant digit M.

For a 64-bit floating point number, the highest 1 bit is the sign bit S, the next 11 bits are the exponent E, and the remaining 52 bits are the significant digit M.

IEEE754 floating point number storage rules

IEEE 754 also has some special provisions for the significant digit M and the exponent E.

How M (significant digits) are stored:

  • 1<=M<2, that is to say, M can be written in the form of 1.XXXXXX, where XXXXXX represents the decimal part.
  • IEEE 754 stipulates that when M is stored inside the computer, the first digit of this number is always 1 by default, so it can be discarded and only the following XXXXXX part is saved.
  • For example, when saving 1.01, only 01 is saved, and when reading, the first 1 is added.
  • The purpose of this is to save 1 significant figure.
  • Taking a 32-bit floating point number as an example, there are only 23 bits left for M. After the first 1 is rounded off, 24 significant digits can be saved.

The storage method of E (exponent bit):

E is an unsigned integer (unsigned int)

This means that if E is 8 bits, its value range is 0~255; if E is 11 bits, its value range is 0~2047. However, we know that E in scientific notation can be negative, so IEEE 754 stipulates that the real value of E must be added to an intermediate number when stored in memory. For an 8-bit E, this intermediate number is 127; For an 11-digit E, this intermediate number is 1023. For example, the E of 2^10 is 10, so when it is saved as a floating point number, it must be saved as 10 + 127 = 137, which is 10001001.

Example four:

Decimal: 0.5

Binary: 0.1

  • (-1)^S*M*2^E
  • (-1)^S represents the sign bit. When S=0, V is a positive number; when S=1, V is a negative number;
  • M represents a significant number, greater than or equal to 1, less than or equal to 2;
  • 2^E represents the exponent bit.

Scientific notation:(-1)^0*1.0*2^-1

S=0 M=1.0 E=-1

IEEE754 floating point number reading rules

Then, the index E is taken out from the memory and can be divided into three situations:

E is not all 0 or 1:

At this time, the floating point number is represented by the following rules: subtract 127 (or 1012) from the calculated value of the exponent E to obtain the real value, and then add the first 1 before the significant digit M.

For example, the binary form of 0.5 (1/2) is 0.1. Since the positive part must be 1, that is, if the decimal point is moved to the right by 1 place, it is 1.0*2^(-1), and its exponent code is -1 + 127= 126, expressed as 01111110, and the mantissa is 1.0, remove the integer part to 0, and fill in the 23 digits 000000000000000000000000, then the binary representation is: 0 01111110 00000000000000000000000

E is all 0:

At this time, the exponent E of the floating point number is equal to 1-127 (or 1-1023), which is the real value, and the exponent is -127.

The effective digit M is no longer added to the first digit of 1, but restored to a decimal of 0.xxxxxx. This is done to represent 0. and very small numbers close to 0.

E are all 1:
At this time, if the significant digits M are all 0, it means infinity (the sign depends on the S bit).

Because such extreme numbers do not easily exist, I will not give examples.

Looking at Example 1 above:

Analysis:

  • At this time, 9 is stored in integer form, so it is 4 bytes
  • 9’s complement: 00000000 00000000 00000000 00001001
  • The first is to print the signed integer in %d decimal form, so it prints “9”
  • The second one is to use %f to output the real number in decimal form.
  • Then cast the integer type to floating point type
  • So the bits read when this floating point data is taken out: 0 00000000 000000000000000000001001
  • S: 0
  • E:00000000
  • M: 00000000000000000001001
  • Then because E is all 0, and when storing, in order to prevent E from being a negative number, 127 is added, so this number plus this 127 equals 127, so it is proved that this number is -127, and because 2 ^-127 is an extremely small number, so the printed result is: “0.000000”
  • The third 9.0 is stored in memory as a floating point number
  • Binary representation of 9.0: 1001.0
  • 9.0 is stored in scientific notation: (-1)^0*1.001*2^3, so
  • S: 0
  • E: 3 + 127 = 130 10000010
  • M: 00100000000000000000000