Interval prediction | MATLAB implements QGPR Gaussian process quantile regression time series interval prediction

- Interval prediction | MATLAB implements QGPR Gaussian process quantile regression time series interval prediction
- - List of effects
  - basic introduction
  - Model description
  - programming
  - References

List of effects

Basic introduction

MATLAB implements QGPR Gaussian process quantile regression time series interval prediction
1. Based on Gaussian process regression (QGPR) quantile time series interval prediction, Matlab code, univariate input model.
2. Evaluation indicators include: R2, MAE, MSE, RMsE, interval coverage and interval average width percentage, etc. The code quality is extremely high, which is convenient for learning and replacing data.

Gaussian process quantile regression is a Gaussian process-based statistical learning method for forecasting time series. In time series interval forecasting, GPR can be used to predict quantiles for a series of future time points, thus providing some information about future trends.
Specifically, GPR can be used to estimate the probability distribution of observations at a given time point at a given quantile level. This distribution can be used to compute interval forecasts. The forecast results of GPR can provide some uncertainty information about the future time series, which is very useful for decision makers and risk managers.

When applying GPR for time series interval prediction, it is necessary to first select an appropriate Gaussian process model, and then perform parameter estimation and model training based on historical data. Once the model is trained, it can be used for forecasting and interval estimation of time series.

It should be noted that GPR is a complex statistical learning method that requires certain mathematical and computer skills for effective application. In addition, the forecast results are also limited by historical data, so the sample data needs to be carefully selected when making time series interval forecasts, and the model needs to be constantly updated to reflect new data and trends.

Model description

The principle and basic idea of QGPR are similar to traditional Gaussian process regression (GPR), but when predicting quantiles, QGPR introduces a quantile loss function to punish the deviation between the predicted results and the real observations. The forecast result of QGPR is a probability distribution that can be used to calculate interval forecasts.
The main formula of QGPR is as follows:
Suppose we have a time series dataset

(

x

,

the y

)

(\mathbf{X}, \mathbf{y})

(X,y), where

x

\mathbf{X}

X is

no

x

d

n \times d

An n×d matrix representing

no

no

n points in time

d

d

d eigenvalues,

the y

\mathbf{y}

y is

no

x

1

n \times 1

An n-by-1 vector representing the corresponding observations. Our goal is to predict future time points

t

?

t^*

t? observations at the given quantile levels

τ

\tau

Probability distribution under τ.
The prediction result of QGPR is a Gaussian distribution whose mean and variance are given by the following formulas, respectively:

the y

(

)

the y

\hat{y}_{t^*, \tau} = \mathbf{k}^T (\mathbf{K} + \sigma^2 \mathbf{I})^{-1} \mathbf{y}

y^?t?,τ?=kT(K + σ2I)?1y

(

)

\hat{\sigma}^2_{t^*, \tau} = k_{t^* t^*} – \mathbf{k}^T (\mathbf{K} + \sigma^2 \mathbf{I} )^{-1} \mathbf{k}

σ^t?,τ2?=kt?t?kT(K + σ2I)?1k

in,
k

\mathbf{k}

k is

t

?

t^*

t? and the kernel function vector of historical time points,

K

\mathbf{K}

K is the kernel function matrix between historical time points,

σ

2

\sigma^2

σ2 is the noise variance,

I

\mathbf{I}

I is the identity matrix. In GPR-Q, the selection of the kernel function usually uses the radial basis function (RBF) kernel function, whose form is:

(

)

exp

(

∣

)

k(x_i, x_j) = \sigma_f^2 \exp\left(-\frac{1}{2l^2}||x_i – x_j||^2\right)

k(xi?,xj?)=σf2?exp(?2l21?∣∣xixj?∣∣2)

in,

σ

f

2

\sigma_f^2

σf2? is the variance of the kernel function,

l

l

l is the length scale of the kernel function.
QGPR introduces a quantile loss function to penalize deviations between predictions and true observations. suppose

the y

t

?

y_{t^*}

yt is the true observation,

f

τ

(

the y

t

?

)

F_{\tau}(y_{t^*})

Fτ?(yt) is the given quantile level

τ

\tau

Cumulative distribution function under τ,

q

τ

(

the y

t

?

)

q_{\tau}(y_{t^*})

qτ?(yt) is the given quantile level

τ

\tau

The quantile point under τ, the quantile loss function can be expressed as:

(

the y

)

{

∣

the y

∣

the y

< q τ ( the y ^ t ? , τ ) ( 1 ? τ ) ∣ the y t ? ? the y ^ t ? , τ ∣ the y t ? ≥ q τ ( the y ^ t ? , τ ) L_{\tau}(y_{t^*}, \hat{y}_{t^*, \tau}) = \begin{cases} \tau |y_{t^*} - \hat{y}_ {t^*, \tau}| & amp; y_{t^*} < q_{\tau}(\hat{y}_{t^*, \tau}) \ (1-\tau) |y_ {t^*} - \hat{y}_{t^*, \tau}| & amp; y_{t^*} \geq q_{\tau}(\hat{y}_{t^*, \ tau}) \end{cases} Lτ?(yt,y^?t?,τ?)={τ∣yt?y^?t?,τ?∣(1?τ)∣yt? y^?t?,τ?∣?yt

in
q

τ

(

the y

^

t

?

,

τ

)

q_{\tau}(\hat{y}_{t^*, \tau})

qτ?(y^?t?,τ?) means that at a given quantile level

τ

\tau

The quantile points of the predicted value of the mean under τ. The goal of QGPR is to minimize the quantile loss function of the prediction results, so as to obtain the optimal prediction results.

Programming

Complete program and data acquisition method 1: Private message bloggers, program exchange of the same value;
Complete program and data download method 2 (download directly from the resource office): MATLAB implements QGPR Gaussian process quantile regression time series interval prediction
Complete program and data download method 3 (subscribe to the “TSFM Statistical Forecasting Model” column, private message me to get the data after subscription): MATLAB implements QGPR Gaussian process quantile regression time series interval prediction

%% Dataset analysis
outdim = 1; % last column is output
num_size = 0.7; % The proportion of the training set to the data set
num_train_s = round(num_size * num_samples); % number of training set samples
f_ = size(res, 2) - outdim; % input feature dimension

%% split training set and test set
P_train = res(1: num_train_s, 1: f_)';
T_train = res(1: num_train_s, f_ + 1: end)';
M = size(P_train, 2);

P_test = res(num_train_s + 1: end, 1: f_)';
T_test = res(num_train_s + 1: end, f_ + 1: end)';
N = size(P_test, 2);

%% data normalization
[p_train, ps_input] = mapminmax(P_train, 0, 1);
p_test = mapminmax('apply', P_test, ps_input);

[t_train, ps_output] = mapminmax(T_train, 0, 1);
t_test = mapminmax('apply', T_test, ps_output);

%% transpose to fit the model
p_train = p_train'; p_test = p_test';
t_train = t_train'; t_test = t_test';

%% model creation
alpha = 0.10;
net = fitrgp(p_train, t_train);

%% simulation test


%% data denormalization
L_sim1 = mapminmax('reverse', l_sim1, ps_output);
L_sim2 = mapminmax('reverse', l_sim2, ps_output);

T_sim1 = mapminmax('reverse', t_sim1, ps_output);
T_sim2 = mapminmax('reverse', t_sim2, ps_output);

References

[1] https://blog.csdn.net/kjm13182345320/article/details/127931217
[2] https://blog.csdn.net/kjm13182345320/article/details/127418340