A large collection of MATLAB codes for mathematical modeling and prediction models and Pearson correlation analysis (no debugging required, open source)

Known 2010-2020 data, predicted 2021-2060 data

1. Logistic prediction of population

%%logistic predicts results from 2021 to 2060
clear;clc;
X = [7869.34, 8022.99, 8119.81, 8192.44, 8281.09, 8315.11, 8381.47, 8423.50, 8446.19, 8469.09, 8477.26];
n=length(X)-1;
for t=1:n
    Z(t)=(X(t + 1)-X(t))/X(t + 1);
end
X1=[ones(n,1) X(1:n)'];
Y=Z';
[B,Bint,r,rint,stats]=regress(Y,X1);% least squares (OLS)
gamma=B(1,1);
beta=B(2,1);
b=log(1-gamma);
c=beta/(exp(b)-1);
a=exp((sum(log(1./X(1:n)-c))-n*(n + 1)*b/2)/n);
XX=2010:2060;
YY=1./(c + a*exp(b*([XX-2010])));
plot(XX,YY,'r-o')
hold on
plot(XX(1:length(X)),X,'b-*')
legend('predicted value','actual value')
xlabel('Year');ylabel('Population (10,000 people)');
title('Population forecast')
set(gca,'XTick',[2010:5:2060])
grid on
format short;
forecast=YY(end-40:end); 21-2060 population forecast results
MAPE=sum(abs(YY(1:n + 1)-X)./X)/length(X);% average relative difference
a,b,c

7f5634b7e3524e7aae1346778579fe11.png

2. Gray forecast GDP

%%Gray prediction model predicts changes in GDP in a certain region from 2021 to 2060
clc;clear;
%Establish symbolic variables a (development coefficient) and b (gray action amount)
syms a b;
c = [a b]';
%Original sequence (here we enter historical carbon emission data)
A = [41383.87,45952.65,50660.20,55580.11,60359.43,65552.00,70665.71,75752.20,80827.71,85556.13,88683.21];
% level ratio test
n = length(A);
min=exp(-2/(n + 1));
max=exp(2/(n + 1));
for i=2:n
ans(i)=A(i-1)/A(i);
end
ans(1)=[];
for i=1:(n-1)
if ans(i)<max &ans(i)>min
else
fprintf('The %dth level ratio is not within the standard interval',i)
disp(' ');
end
end
%Add the original sequence A to obtain the sequence B
B = cumsum(A);
%Logarithmic column B generates immediate mean
for i = 2:n
C(i) = (B(i) + B(i - 1))/2;
end
C(1) = [];
%Construct data matrix
B = [-C;ones(1,n-1)];
Y = A; Y(1) = []; Y = Y';
%Use the least squares method to calculate parameters a (development coefficient) and b (gray action amount)
c = inv(B*B')*B*Y;
c = c';
a = c(1);
b = c(2);
%Predict subsequent data
F = []; F(1) = A(1);
for i = 2:(n + 40)
F(i) = (A(1)-b/a)/exp(a*(i-1)) + b/a;
end
% Logarithmic series F is restored by cumulative subtraction to obtain the predicted data.
G = []; G(1) = A(1);
for i = 2:(n + 40)
G(i) = F(i) - F(i-1); %get the predicted data
end
disp('Predicted data is:');
G
%Model testing
H = G(1:n);
%Calculate residual sequence
epsilon = A - H;
%Method 1: Relative residual Q test
%Calculate relative error sequence
delta = abs(epsilon./A);
% Calculate the relative error average Q
disp('Relative residual Q test:')
Q = mean(delta)
% Method 2: Variance ratio C test disp('Variance ratio C test:')
C = std(epsilon, 1)/std(A, 1)
% Method 3: Small error probability P test
S1 = std(A, 1);
tmp = find(abs(epsilon - mean(epsilon))< 0.6745 * S1);
disp('Small error probability P test:')
P = length(tmp)/n
%Draw a curve graph
t1 = 2010:2020;
t2 = 2010:2060;
plot(t1, A,'-b','LineWidth',2);
hold on;
plot(t2, G, 's','LineWidth',1);
xlabel('Year'); ylabel('GDP (100 million yuan)');
legend('actual GDP','forecast GDP');
title('2021-2060GDP Forecast');
grid on;

0cdde054477c425da5d5106aecb68e9d.png

3. BP neural network prediction

Select the carbon emissions of province x from 2000 to 2017 as the training set and the carbon emissions of province x from 2018 to 2022 as the test set to predict the carbon emissions of province x from 2023 to 2026. Set the number of training times to 1000 times and the learning rate to 0.2; after fitting the BP neural network model to the training set, the mean square errors of the model’s training samples, verification samples, and test samples are 0.000012, 0.0023, and 0.0042 respectively, and the overall error is 0.0082203 , Therefore, the prediction accuracy of the trained BP neural network model is higher. The results of the trained BP neural network neural model are shown in Figure 3

clear all
 clc
 clf
 %% 1, read the data and normalize it
input_1=[2391,2487,2588,2683,3150,3513,3751,3969,4384,4653,4482,5366,6238,6515,6647,6704,6806,6682,6346,6253,6513,7120,7597];
 n=length(input_1);
 row=4; % predict the fifth year through the data of the first four years
input=zeros(4,n-row);
 for i =1:row
 input(i,:)=input_1(i:n-row + i-1);
 end
 output=input_1(row + 1:end);
 [inputn,inputps]=mapminmax(input);
 [outputn,outputps]=mapminmax(output);
 %% 2, divide the training set and test set
inputn_train=inputn(:,1:n-row-5);
 inputn_test=inputn(:,n-row-4:end);
 outputn_train=outputn(1:n-row-5);
 outputn_test=outputn(n-row-4:end);
 %% 3. Build BP neural network
hiddennum=10;% Empirical formula for the number of hidden layer nodes p=sqrt(m + n) + a
 net=newff(inputn_train,outputn_train,hiddennum,{'tansig','purelin'},'trainlm'); %tansig: tangent S-shaped transfer function. purelin: linear transfer function. trainlm: Levenberg-Marquardt algorithm
%% 4. Network parameter configuration
net.trainParam.epochs=1000;
 net.trainParam.lr=0.2;
 %% 5, BP neural network training
[net,tr]=train(net,inputn_train,outputn_train);
 %% 6, simulation calculation
resultn=sim(net,inputn_test);
 %% 7, error between calculation and test set
result=mapminmax('reverse',resultn,outputps);
 output_test=mapminmax('reverse',outputn_test,outputps);
 error=result-output_test;
 rmse=sqrt(error*error')/length(error);
 figure(1)
 plot(output_test,'b')
 hold on
 plot(result,'r*');
 hold on
plot(error,'s','MarkerFaceColor','k')
 legend('expected value','predicted value','error')
 xlabel('number of data groups')
 ylabel('value')
 %% 8, predict carbon emissions in the next four years
pn=3;
 [p_in,ps]=mapminmax(input_1(n-row + 1:end));
 p_in=p_in';
 p_outn=zeros(1,pn);
 for i = 1:pn
 p_outn(i)=sim(net,p_in);
 p_in=[p_in(2:end);p_outn(i)];
 end
 p_out=mapminmax('reverse',p_outn,ps)
 figure(2)
 plot(2000:2022,input_1,'k--o')
 hold on
 plot(2018:2022,result,'b--*')
 hold on
 plot(2023:2026,[result(end),p_out],'r-- + ')
 legend('actual value','fitted value','predicted value')

f17542992b344c6482b8d08bd26ec5e1.png

As can be seen from Figure 3, the mean square errors of the verification sample and the test sample converge to nearly eq?10^{-2} reaches the minimum, and the BP neural network model trained at this time is optimal. The BP neural network model is used to predict that the carbon emissions of province x from 2023 to 2026 are 71.4939 million tons, 75.566 million tons, 74.411 million tons, and 74.791 million tons respectively. The changing trends of actual values, fitted values, and predicted values of carbon emissions in province x are shown in the figure below.

1173ebd82af544a9a2759f4a0bc084de.png

The error diagram between actual building carbon emissions and predicted carbon emissions in the entire process from 2018 to 2022 is shown in the figure below:

c3933d54cf354793b631400cbe03bea1.png

After training the BP neural network model, the data correlation of the training set, verification set, test set and overall results can be obtained. The correlation coefficients of the predicted output and target output of the training samples, verification samples, and test samples are 0.99974, 0.9935, and 0.99983 respectively. The overall correlation coefficient is 0.99238, as shown in Figures 4, 5, 6, and 7. The BP neural network has better fitting results.

3d2a7d39a79748e684ab39b4e28b04cf.png

9da4cf3623d049b0a0051748cbf78331.png

4. Pearson correlation analysis code

919c0a899e00415f9f6cab52913fa55a.png

xiu.xlsx

Put the xiu.xlsx into a new folder, and then import the table in MATLAB (click the green arrow folder)

41f8c076953e4eeeb7e79b447e8b4ac3.png

%%Pearson correlation analysis matrix code
clc
clear all
data=xlsread('xiu.xlsx',1,'B2:J12');
figure
% Find the correlation coefficient between dimensions
rho = corr(data, 'type','pearson');
% draw heat map
string_name={'Population','Primary industry GDP','Second industry GDP','Tertiary industry GDP','Primary industry energy consumption','No. Secondary industry energy consumption','Tertiary industry energy consumption','Resident energy consumption','Total carbon emissions'};
xvalues = string_name;
yvalues = string_name;
h = heatmap(xvalues,yvalues, rho, 'FontSize',10, 'FontName','宋体');
h.Title = 'Pearson correlation analysis coefficient matrix';
colormap summer


figure
% You can define your own color blocks
H = heatmap(xvalues,yvalues, rho, 'FontSize',10, 'FontName','宋体');
H.Title = 'Pearson correlation analysis coefficient matrix';
colormap(autumn(5))% sets the number of colors

The colormap function is used to set the color map of the current graphic. Common color mappings are: summer\autumn\winter\spring\cool\hot\hsv\jet

1818d53075fa472186e6ef77e9688b18.png

c7e5ef2240c04297adbd73062e9192a5.png