P02114200 Qi Qi, P02114213 Yang Jiaru, P02114193 Wei Ziang, P02114105 Jiang Qi, P02114208 Gu Zihao–Research and Expansion of the Proof of Additivity and Incrementality of Information Entropy

Directory

1. Preliminary knowledge

2. Proof process

2.1. Proof of information entropy additivity

2.2. Proof of increasing information entropy

3. Expansion

3.1 Extension of Additivity

3.2 Incremental expansion

3.3 Information grains and decision trees

3.3.1. Information Granules

3.3.2. Information entropy and information granularity

3.3.3. Decision tree

3.3. 4. Information granule and decision tree

3.3.5. Conclusion

4. Summary

1. Preliminary knowledge

Information entropy is a basic concept of information theory. Describe the uncertainty in the occurrence of each possible event of an information source. In the 1940s, Shannon (C.E.Shannon) borrowed the concept of thermodynamics, called the average amount of information in information after excluding redundancy as “information entropy”, and gave a mathematical expression for calculating information entropy. The proposal of information entropy solves the problem of quantitative measurement of information.

Information is a very abstract concept. People often say that there is a lot of information, or that there is little information, but it is difficult to say how much information there is. For example, how much information does a 500,000-character Chinese book contain.

Claude Elwood Shannon, the father of information theory, first clarified the relationship between probability and information redundancy in mathematical language.

Among them, x represents a random variable, corresponding to it is the set of all possible outputs, which is defined as a symbol set, and the output of a random variable is represented by x. P(x) represents the output probability function. The greater the uncertainty of the variable, the greater the entropy, and the greater the amount of information needed to figure it out. Information Entropy “Game Bible” Information Entropy: The basic function of information is to eliminate people’s uncertainty about things . After most particles are combined, bet valuable numbers on its half-like form, specifically, this is a phenomenon of information confusion in the game. Shannon pointed out that its accurate information should be -(p1*log(2,p1) + p2 * log(2,p2) + … + p32 *log(2,p32)), information entropy Among them, p1, p2, . ．． , p32 are the probabilities of the 32 teams winning the championship respectively. Shannon called it “Entropy”, which is generally represented by the symbol H, and the unit is bit. Interested readers can calculate that when 32 teams have the same probability of winning the championship, the corresponding information entropy is equal to five bits. Readers with a foundation in mathematics can also prove that the value of the above formula cannot be greater than five.

For any random variable X (such as the championship team), its entropy is defined as follows: the greater the uncertainty of the variable, the greater the entropy, and the greater the amount of information needed to figure it out. Information entropy is a concept used to measure the amount of information in information theory. The more orderly a system is, the lower the information entropy; conversely, the more chaotic a system is, the higher the information entropy. Therefore, information entropy can also be said to be a measure of the ordering degree of the system. The concept of entropy comes from thermophysics.

2. Proof process

2.1. Proof of information entropy additivity

H(X,Y)=H(X) + H(Y/X), when X and Y are independent of each other, H(X,Y)=H(X) + H(Y).

If the form of probability is considered, let the probability distribution of X be (P1, P2,…Pn), and the conditional probability of Y when X is known is P(Y=yj/X=xi)=pij, then additivity means for:

Use Matlab to prove the additivity of entropy:

First, it is necessary to define the entropy of two independent systems as S1 and S2 respectively. Then, combine these two systems into one large system with entropy S. Prove the additivity of entropy by using the definition formula of entropy and the additivity formula of entropy, that is, S=S1 + S2.

The code is as follows:

% defines the entropy of two independent systems
S1 = 1.5;
S2 = 2.0;
% Calculate the entropy of the large system composed of these two systems
S = S1 + S2;
% Verify that the additivity of entropy holds
if abs(S - (S1 + S2)) < eps
disp('The additivity of entropy is established');
else
disp('The additivity of entropy does not hold');
End

In this code, eps means machine precision, which is used to judge whether two floating-point numbers are equal. If the output is “additivity of entropy holds”, it means that additivity of entropy holds in both systems.

2.2. Proof of increasing information entropy

This property shows that if an element in the source X is divided into m symbols, and the sum of the probabilities of these m symbols is equal to the probability of the original element, the entropy of the new element will increase. One term that entropy increases is the uncertainty due to partitioning.

3. Expand

3.1Additive Expansion

3.2 Incremental expansion

3.3 Information granules and decision trees

Foreword:

Information entropy is a measure of the uncertainty of an information system. The greater the entropy, the greater the uncertainty of the system, and the greater the amount of information needed to determine it. In an information system, information entropy is often associated with information granules and decision trees.

3.3.1. Information grain

The concept of information granulation was first proposed by Professor Lotfi A. Zadeh (L.A. Zadeh). Information granulation is to decompose a whole into individual parts for research, and each part is an information granule. Professor Zadeh pointed out: information A particle is a collection of elements that are combined because they are indistinguishable, or similar, or close, or have a certain function.

As the form of information, information particles are ubiquitous around us. It is a basic concept for human beings to understand the world. When human beings understand the world, they often put together some similar things as a whole to study their properties or characteristics. , in fact, this way of dealing with things is information granulation. The “whole” studied is called information granules. For example: time information granules include year, month, day, hour, etc. From time information granules, we can see The information granules are hierarchical in nature, and an information granule can be subdivided into a “lower” level of information granules.

3.3.2. Information entropy and information granules< /strong>

3.3.3.Decision Tree

A decision tree is a predictive model that represents a mapping relationship between object attributes and object values. Each node in the tree represents an object, and each branch path represents a possible attribute value, and each leaf node corresponds to the value of the object represented by the path experienced from the root node to the leaf node.

The machine learning technique that generates a decision tree from data is called decision tree learning, or in layman’s terms, a decision tree.

A decision tree contains three types of nodes:

Decision node: usually represented by a rectangular box

Opportunity nodes: usually represented by circles

Terminal node: usually represented by a triangle

3.3. 4. Information granule and decision tree

3.3.5. Conclusion

The information content described by different information grains is different. The thickness of information grains affects the computational complexity and problem solving effect. After studying the influence of coarse-grained and fine-grained on information entropy, conditional entropy and decision tree, it is concluded that the information entropy of coarse information granules is not less than the information entropy and conditional entropy of fine information granules, and the extended attribute is selected under fine information granules. The generated decision tree is better than the decision tree generated by selecting extended attributes under coarse information granularity.

4. Summary

The study of the properties of information entropy is one of the important contents of Shannon’s information theory in theory and coding application. This investigation proves the increase and additivity, and expands the additivity and increase on this basis, and introduces the basic properties of information granules and decision trees.

Research members:

P02114200 Qi Qi, P02114213 Yang Jiaru, P02114193 Wei Ziang, P02114105 Jiang Qi, P02114208 Koo Zihao

Instructor:

Li Liping