Quantification method of indicator importance Entropy Weight Method

01.Definition

In the previous article, we introduced the analytic hierarchy process, but this method has certain limitations: evaluation is highly subjective. But in actual data evaluation and analysis, we need more objective method evaluation.

This article introduces an objective weighted data analysis method – the entropy method. Entropy Weight Method is a method commonly used in multi-index decision analysis. Its calculation principle is based on information entropy theory. This method can quantify the importance of different indicators and apply it to fields such as multi-objective decision-making, evaluation and ranking.
Entropy is a concept in information theory, used to describe the uncertainty and amount of information in a random event. In the entropy value method, the calculation formula of entropy value is based on information entropy. It is calculated by dividing the value range of the indicator into several equal parts, calculating the probability of each equal part, and then bringing the probability into the information entropy formula. The entropy value of the indicator. Entropy is a metric used to quantify the volatility and uncertainty of an indicator. The greater the entropy value, the greater the fluctuation and uncertainty of the indicator, and the greater its impact on the decision-making results. In multi-objective decision-making, by calculating the entropy value of each indicator, the importance of each indicator can be quantified, and then the weight of each indicator can be determined, making the decision-making more scientific, objective and accurate.

02. Calculation Principle

The calculation principle of the entropy method is relatively simple and is mainly divided into four steps. However, the original data needs to be preprocessed before calculation. In actual problems, there may be multiple indicators for one problem, and different indicators may have the following situations: ① The larger the indicator value, the better; ② The smaller the indicator value, the better. The better; ③ It is similar to a normal distribution curve, and it is best at a certain point in the middle.

There are different data preprocessing methods for indicators in different types of situations, but it should be understood that our original purpose of preprocessing data is to eliminate the influence of units and dimensions between different indicators. Therefore, we usually choose to scale all indicators to the [0,1] interval for analysis.

It is worth noting that if you directly choose normalization at this time, there will be a fatal problem, which is the three dispersion situations of different indicator data mentioned earlier. Before formal normalization, we need to homogenize all indicators (positive homogeneity of indicators: that is, the larger the value of all indicators, the better; negative homogeneity of indicators: that is, the smaller the value of all indicators, the better). In this article, the indicator positive isotropic is used to process the data, that is, the following calculations are performed on the original data according to the situation:

There are m objects to be evaluated and n evaluation indicators, which can form a data matrix:

Assume that the elements in the data matrix after index forward processing are:

Step0: Indicators are moving in the same direction

Note : The positive isotropic method of indicators is not necessarily the same. You can also derive it by yourself, as long as the basis of the indicator properties is not changed. In addition, attention should be paid to de-negative numbers, that is, if there are negative numbers in the indicator, the original data should be normalized to the [0,1] interval first, and then the indicator can be homogeneous.

①The smaller the better indicator:

②The bigger the better indicator:

③Normal distribution curve type:

Step1: Normalization processing

After all indicators are normalized, they are then normalized. The normalized matrix is (Rij) m*n:

The normalization here has been discussed in detail in previous blog posts. If you forget, you can refer to the link:

Data preprocessing regression model

Step2: Calculate the entropy value of each indicator

Note in this step that Pij cannot be 0, otherwise an error will be reported when performing logarithmic operation. If you want to solve this situation, you only need to set the lower limit of the interval slightly greater than 0 during the previous step of normalization. If the information entropy of an indicator is smaller, it means that the degree of variation of the indicator value is greater and the amount of information provided is greater. It can be considered that the indicator plays a greater role in comprehensive evaluation.

Step3: Calculate the weight of indicators

Step4: Find the weighted sum and draw a conclusion

03. Code implementation

This article uses a classic car purchase decision case. The original data is as follows:

	Fuel consumption	Power	Price	Security	Maintenance	Operation
Honda	5	1.4	6	3	5	7
Audi	9	2	30	7	5	9
Santana	8	1.8	11	5	7	5
Buick	12	2.5	18	7	5	5

%% calculation code is as follows:
%% Program initialization
clear all
clc

%% raw data reading
data = [5,1.4,6,3,5,7;9,2,30,7,5,9;8,1.8,11,5,7,5;12,2.5,18,7,5,5 ];
[m,n] = size(data);

%% Indicators are moving in the same direction
% Here we believe that the lower the fuel consumption, cost, and operability, the better; the higher the power, safety, and maintenance, the better.
% Smaller is better data processing
index = [1,3,6];
for i =1:length(index)
    data(:,index(i)) = 1./data(:,index(i));
end

%% Data normalization
% Since mapminmax is normalized by row, it is transposed before normalization, and in order to prevent 0 from appearing, the normalization range is set to 0.001.
new_data = mapminmax(data',0.001,1);
new_data = new_data';

%% Calculate the entropy value of an indicator
% Step1: Find Pij
for i = 1:m
    for j = 1:n
        P(i,j) = new_data(i,j)/sum(new_data(:,j));
    end
end


% Step2: Find Ej
for i = 1:m
    for j = 1:n
        e(i,j) = P(i,j)*log(P(i,j));
    end
end

for j=1:n
    E(j)=-1/log(m)*sum(e(:,j));
end

%% difference coefficient
g = 1-E;

%% Calculate indicator weight
for j = 1:n
    w(j) = g(j)/sum(g);
end

disp('The weight of each indicator is:')
disp(w)

%% Calculate score
for i =1:m
    score(i,1) = sum(new_data(i,:).*w);
end

disp('The comprehensive score of each brand is:')
disp(score)

Although the entropy method determines the index weight based on the degree of variation in the value of each indicator, it is an objective weighting method that avoids deviations caused by human factors. However, there are still certain shortcomings: the importance of the indicator itself is ignored, and sometimes the determined indicator weights are far from the expected results. At the same time, the entropy value method cannot reduce the dimensionality of the evaluation indicators.

Original text: Data evaluation analysis-entropy method

The knowledge points of the article match the official knowledge files, and you can further learn related knowledge. Algorithm skill tree Home page Overview 52954 people are learning the system