From information to knowledge to intelligence – understanding and changing the world: the birth of informatics

Author: Zen and the Art of Computer Programming

1. Introduction

Information technology is changing our lives, work, education, scientific research and other fields. At present, people’s demand for information has far exceeded the channels and means of its production. More and more people still search the Internet when they can’t see TV, use mobile phones when they can’t hear sound, and use WeChat and Weibo to exchange information. We all know that the word “information” has many meanings, ranging from biological information to social networks to economic and military intelligence, and even government spies, corporate secrets, etc. Driven by information technology, the popularity of computers and the Internet has made people’s lives more convenient and transparent. With the development of new technologies such as mobile Internet, cloud computing, and big data analysis, people’s ways of obtaining information have also undergone tremendous changes. After the emergence of communication media such as the Internet, text messages, WeChat, Weibo, TV, etc., the word “information” has gradually become a new vocabulary, and various information, knowledge, and intelligence have penetrated into every aspect of life. The huge impact of information technology has also caused great controversy.
As a new high technology, informatics is undoubtedly very important. It helps us understand the world, improve our capabilities, and promote development. However, informatics is not a simple matter. To truly understand its mysteries, you first need to have a relatively full understanding of the information and be able to master relevant theoretical methods and tools. Therefore, this article will introduce the concepts, terminology, algorithms, principles, code examples, future development trends, etc. of informatics in a simple and easy-to-understand manner, providing readers with a good start.

2.Basic concepts and terminology

2.1 What is information

Information is something that exists objectively. It can be either a physical object (such as a wall) or an abstract, subconscious thing (such as a thought). Information can also be the product of a digital or obfuscated process. For example, “twelve years ago” is expressed as 1990; people’s attention quickly shifts from one point to another, which are all manifestations of information.

2.2 How is the information processed?

Information processing system is a complex process involving many different fields, such as hardware, software, algorithms, artificial intelligence, statistics, biology, psychology, etc. The information processing system starts from the original input and outputs information through analysis, organization, storage, transmission, reception, analysis, and processing.

2.3 Messages and Signals

Messages and signals are the most basic concepts in information processing systems. Message refers to an objectively existing entity or object, such as text, image, video, etc., which can be sent directly to the receiving end. Signaling refers to a method of transmitting messages through space-time continuity. A signal consists of two main properties: frequency and amplitude. Frequency is usually expressed in Hertz, which is the number of emissions per second. Amplitude refers to the strength of the signal, and the value range is usually -1~+1, where the signal of the negative half-wavelength is negative, and the signal of the positive half-wavelength is positive.

2.4 Encoding and decoding

Encoding and decoding are important links in the information processing process. Encoding is to convert the original information into a digital signal, and decoding is to convert the digital signal back to the original information. The purpose of encoding is to facilitate transmission, storage, and processing. The purpose of decoding is to make the information identifiable and understandable to humans. Common encoding methods include ASCII encoding, UTF-8 encoding, GBK encoding, Base64 encoding, etc.

2.5 Latency and Bandwidth

Delay and bandwidth are two important parameters that affect the speed of information transmission. Latency refers to the time interval between signal transmission and arrival at the receiver. Bandwidth refers to the maximum capability of a signal, that is, the effective amount of information that the signal can transmit. The higher the bandwidth, the faster the transfer speed. But at the same time, it will also cause problems such as the receiving end being unable to process it and losing information.

2.6 Data communication system

Data communication systems are the infrastructure for information processing. It contains multiple subsystems, such as hubs, modems, LANs, wireless communications, mobile communications, etc. One of the functions of the data communication system is to transmit data. Each component in the data communication system has different functions and performs its own duties.

2.7 Data Compression

Data compression is a method of reducing the bit width of data. The goal of data compression is to reduce the disk space or network bandwidth occupied by data and reduce file size. Data compression can be applied to any type of data such as images, text, videos, etc. Common data compression algorithms include ZIP, LZW, JPEG, PNG, etc.

3. Basic principles of information processing

3.1 Encoding

Encoding is to convert raw information into digital signals. The encoding process usually consists of two steps: encoding decision and symbol transfer. The encoding decision determines which encoding method to use, and symbol transmission is the process of actual transmission. Common encoding methods include ASCII encoding, UTF-8 encoding, GBK encoding, Base64 encoding, etc.

ASCII encoding

ASCII encoding is the first character encoding standard, which specifies the binary encoding of a series of printable characters. Because ASCII has good compatibility, it is very popular. Its definition is as follows:

ASCII encoding:
0 00000000 NUL null character
1 00000001 SOH START OF HEADING(title)
2 00000010 STX BEGIN OF TEXT(Text)
3 00000011 ETX END OF TEXT(Text)
4 00000100 EOT END OF TRANSMISSION (end of transmission)
5 00000101 ENQ request
6 00000110 ACK confirmation
7 00000111 BEL Ring
8 00001000 BS Backspace key
9 00001001 HT horizontal TAB
10 00001010 LF line feed (Line Feed)
11 00001011 VT vertical TAB
12 00001100 FF Form Feed
13 00001101 CR Enter key
14 00001110 SO Start splitting (Shift Out)
15 00001111 SI Stop splitting (Shift In)
16 00010000 DLE Data Link Escape
17 00010001 DC1 Device Control 1
18 00010010 DC2 Device Control 2
19 00010011 DC3 Device Control 3
20 00010100 DC4 Device Control 4
21 00010101 NAK Negative confirmation
22 00010110 SYN sync
23 00010111 ETB Termination Byte
24 00011000 CAN Cancel
25 00011001 EM end tag
26 00011010 SUB Replacement
27 00011011 ESC deviation
28 00011100 FS file separator
29 00011101 GS group separator
30 00011110 RS record separator
31 00011111 US unit separator
32 00100000 SPACE space
33 00100001 ! Exclamation mark
34 00100010 " double quotes
35 00100011 # pound sign
36 00100100 $ dollar sign
37 00100101 % percent
38 00100110 & amp; ampersand
39 00100111 ' single quote
40 00101000 (left bracket
41 00101001 ) right bracket
42 00101010 * asterisk
43 00101011 + plus sign
44 00101100 , comma
45 00101101 - minus sign
46 00101110 . Decimal point
47 00101111 / division sign
48 00110000 0 number 0
49 00110001 1 number 1
50 00110010 2 number 2
51 00110011 3 number 3
52 00110100 4 number 4
53 00110101 5 number 5
54 00110110 6 number 6
55 00110111 7 number 7
56 00111000 8 number 8
57 00111001 9 number 9
58 00111010 : colon
59 00111011 ; semicolon
60 00111100 < less than sign
61 00111101 = equal sign
62 00111110 > greater than sign
63 00111111 ? question mark
64 01000000 @ hat
65 01000001 A Capital letter A
66 01000010 B Capital letter B
67 01000011 C Capital letter C
68 01000100 D capital letter D
69 01000101 E Capital letter E
70 01000110 F Capital letter F
71 01000111 G Capital letter G
72 01001000 H Capital letter H
73 01001001 I Capital letter I
74 01001010 J uppercase letterJ
75 01001011 K capital letter K
76 01001100 L Capital letter L
77 01001101 M Capital letter M
78 01001110 N Capital letter N
79 01001111 O Capital letter O
80 01010000 P Capital letter P
81 01010001 Q Capital letter Q
82 01010010 R Capital letter R
83 01010011 S Capital letter S
84 01010100 T Capital letter T
85 01010101 U Capital letter U
86 01010110 V capital letter V
87 01010111 W Capital letter W
88 01011000 X Capital letter X
89 01011001 Y Capital letter Y
90 01011010 Z Capital letter Z
91 01011011 [Left square bracket
92 01011100 \ backslash
93 01011101 ] right square bracket
94 01011110 ^ circumflex accent (three dots above)
95 01011111 _ underscore
96 01100000 ` grave accent(overwhelming opening key)
97 01100001 a lowercase letter a
98 01100010 b lowercase letter b
99 01100011 c lowercase letter c
100 01100100 d lowercase letter d
101 01100101 e lowercase letter e
102 01100110 f lowercase letter f
103 01100111 g lowercase letter g
104 01101000 h lowercase letter h
105 01101001 i lowercase letter i
106 01101010 j lowercase letterj
107 01101011 k lowercase letter k
108 01101100 l lowercase letter l
109 01101101 m lowercase letter m
110 01101110 n lowercase letter n
111 01101111 o lowercase letter o
112 01110000 p lowercase letter p
113 01110001 q lowercase letter q
114 01110010 r lowercase letter r
115 01110011 s lowercase letter s
116 01110100 t lowercase letter t
117 01110101 u lowercase letter u
118 01110110 v lowercase letter v
119 01110111 w lowercase letter w
120 01111000 x lowercase letter x
121 01111001 y lowercase letter y
122 01111010 z lowercase letter z
123 01111011 {<!-- --> left curly brace
124 01111100 | vertical line
125 01111101 } Right curly brace
126 01111110 ~ tilde accent (overlapping letters)
127 01111111 DEL Delete(Delete)

UTF-8 encoding

UTF-8 is an implementation of Unicode. It can use 1-6 bytes to represent a character, and has different encoding lengths according to the distribution of different characters. Common characters use 1 byte, rare characters use 2 bytes. The encoding rules of UTF-8 are as follows:

If the character falls within the ASCII range, then the binary representation corresponding to the character can be used directly.
If the character belongs to the two-byte Unicode range, then first shift the first 12 binary bits of the character to the right by 6 bits, take the last 6 bits, and then shift the remaining 6 binary bits to the left by 12 bits to get the result. . Add a binary flag starting with 10 in front of the result to get the final binary representation. For example, the Unicode of the decimal character ‘2’ is 0010. First shift it to the right by 6 bits, and the result is 00000010. Then shift the remaining 12 bits to the left by 12 bits, and the result is 00100000. Finally, add a binary flag starting with 10 in front. The resulting binary is 11000010.
If the character belongs to the three-byte Unicode range, then proceed as in step 2, except that the number of right shifts in the second step is changed to 18 bits.
If the character belongs to the four-byte Unicode range, then proceed as in step 2, except that the number of right shifts in the second step is changed to 24 bits.
If the character falls within the five-byte Unicode range or longer, proceed as in step 2, except that the number of right shifts in the second step is changed to 30 bits.
During the UTF-8 encoding process, if you encounter illegal characters, such as characters whose encoding is greater than the range supported by UTF-8, you can choose to ignore them or replace them with alternative characters, such as ‘?’.

GBK encoding

GBK encoding is a variant of China’s national standard GB2312. GBK encoding is similar to UTF-8 encoding and is also part of the Unicode encoding scheme. The difference between GBK encoding and UTF-8 encoding is the extended character set of GBK encoding, which is compatible with GB2312 encoding. Generally speaking, for international coding of Chinese websites, GBK coding is preferred.

Base64 encoding

Base64 encoding is a method of encoding arbitrary binary to text strings. It encodes three bytes of binary data into four bytes of text data, using the characters ‘+’ and ‘/’ to replace the numbers 0 to 63 respectively. Due to the combination of groups of three bytes, all bytes can be divided into a total of four groups of exactly four characters each. Each set of binary data is arranged in a group of 6 bits, with a total of 24 bits. Each group has 4 characters except the last group which only has 2 or 3 bytes.

3.2 Decoding

Decoding is the process of restoring encoded digital signals. Decoding usually requires two steps: the decision system and error recovery. The decision system determines whether all symbols have been decoded correctly based on mechanisms such as checksums or padding bytes. Error recovery is used to correct symbols corrupted during transmission or storage.

3.3 Quantification

Quantization refers to converting continuous signals into discrete digital signals. Common quantization methods include ADC (analog-to-digital conversion), DAC (digital-to-analog conversion), DMC (differential analog-to-analog conversion), QPSK (two-dimensional wave simulation), QAM (two-dimensional orthogonal simulation), etc. Quantization can reduce the amount of data and speed up operations.

3.4 Delay synchronization

Delay synchronization is another important link in information processing systems. It ensures that the clocks of the receiving end and the sending end are consistent to avoid data confusion or delay. Synchronization methods include mutual sender clock, mutual receiver clock, shared clock, GPS, etc.

3.5 Flow Control

Flow control is a means of preventing network congestion. It ensures that all users in the network receive adequate service quality by controlling the sending rate of the sending end. Flow control can set thresholds and automatically adjust according to network conditions.

3.6 Reliable transmission

Reliable transmission means that data can still be sent and received normally in an unreliable network environment. The main methods include stop waiting protocol, timeout retransmission protocol, selected retransmission protocol, etc.

3.7 Data Compression

Data compression is a method of reducing the bit width of data. The purpose of data compression is to reduce the disk space or network bandwidth occupied by data and reduce the file size. Data compression can be applied to any type of data such as images, text, videos, etc. Common data compression algorithms include ZIP, LZW, JPEG, PNG, etc.

3.8 Encryption and Authentication

Encryption and authentication are two important concepts in the field of information security. Encryption refers to encoding data to hide the original information so that it cannot be read. The main methods include public key encryption, private key encryption, symmetric encryption, asymmetric encryption, etc. Authentication refers to identifying whether the access request comes from a legitimate user. The main methods include password verification, digital signature, identity verification, etc.

4. Development trends of informatics

Information technology has become an integral part of all walks of life. The development trend of information science mainly includes three aspects: hardware upgrade, further development in the communication field and the rise of artificial intelligence.

4.1 Hardware upgrade

The upgrade of hardware means that the overall level of information technology has improved. The performance bottlenecks of the personal PC era have become history. Through the deployment of GPUs, FPGAs and other chips and the optimization of software, personal PCs have approached the server level. Further development in the communications field mainly refers to technological advances in base stations, indoor positioning, driverless driving, medical and health care, etc. Autonomous driving currently relies on machine learning technology and wireless communication technology, and has gradually become a hot spot in the industry.

4.2 Further development in the field of communications

Further development in the field of communication means the continued growth of wireless communication technology, universal communication, optical communication, carrier communication and other technologies. The continuous growth of wireless communication technology means the increase of communication distance, enabling information connection around the world. Universal communication refers to the communication between the earth and the moon through microwave, infrared, radar and other sensors. Optical communication refers to the use of light pulses, photoelectric effects, etc. to transmit data. It is a low-power, short-distance communication. Carrier communication refers to the use of space sounds, time slot sounds, radio waves, etc. to transmit data, and is a high-speed, long-distance communication.

4.3 The rise of artificial intelligence

The rise of artificial intelligence has become a hot topic today. Artificial intelligence research involves many fields, including pattern recognition, deep learning, machine learning, natural language processing, computer vision, speech recognition, search ranking, etc. At the same time, applications in the field of search ranking are becoming more and more widespread. The development of artificial intelligence will change the architecture of computers and completely change the way information is obtained.