LightTS: Lightly sampled MLP structured network for multivariate time series forecasting

Paper:2022 | Less is more: Fast multivariate time series forecasting with light sampling-oriented mlp structures [1]

Authors: Zhang, Tianping, Yizhuo Zhang, Wei Cao, Jiang Bian, Xiaohan Yi, Shun Zheng, and Jian Li

Institution: Tsinghua University, Microsoft Research Asia

Code: https://github.com/thuml/Time-Series-Library/blob/main/models/LightTS.py

Citations:22

In the previous TimesNet paper, Wu Haixu and others converted the temporal 1D structure into a 2D structure to facilitate the extraction of more information. This article, LightTS, also converts a 1D timing structure into a 2D structure, and it is very simple. I have seen friends in timing competitions reshape the 1D timing into 2D, and then use convolution kernel modeling to extract information. Follow this article The article is very similar, except that two methods of sampling and organizing 2D data are considered here, and then MLP is used to extract features.

First, assuming that the input time series dimension is [B, T, N], the author did two types of sampling:

– Continuous sampling: focuses on capturingshort-term local patterns.

– Interval Sampling: Focuses on capturinglong-term dependencies.

As shown in the figure below, it is easy to understand that the new data dimensions are [B, C, T/C, N], and N represents the number of time series.

Note: In the paper, IEBlockC directly outputs the prediction result, but in the code, as shown in the red mark in the figure above, there will be a prediction highway result from the input end, which is then added to IEBlockCx as the final prediction output.

After that, an Information Exchange Block (IEBlock) module is passed. This module is very simple. It is to do temporal projection (MLP in the C dimension) and channel projection (T/C) for each timing sequence [B, C, T/C, 1]. After MLP in dimensions), then the two results [B, F’, T/C] are added and fed into MLP to obtain the final output with dimensions [B, F’, T/C, 1].

Note: In the paper, the temporal projection results are fed in series to the channel projection, but in the code, the two projections actually process the input in parallel, and then feed the output projection after addition.

After the two sampling results pass through their respective IEBlocks, they undergo linear layer conversion, merge the results, and then feed the final IEBlock to output the prediction results.

Experimental results:

But from the TimesNet paper, compared with Dlinear, it still makes little sense:

Let’s go directly to the code. I wrote the dimension changes in the comments. When combined with the model diagram above, it will be very clear (I will try it later in the game):

def encoder(self, x):
    B, T, N = x.size() # [B, T, N]
    
    # [B, T, N] -> [B, T_pred, N]
    highway = self.ar(x.permute(0, 2, 1))
    highway = highway.permute(0, 2, 1)


    #continuous sampling
    # [B, T, N] -> [B, T/C, C, N]
    x1 = x.reshape(B, self.num_chunks, self.chunk_size, N)
    # [B, T/C, C, N] -> [B, N, C, T/C]
    x1 = x1.permute(0, 3, 2, 1)
    # [B, N, C, T/C] -> [B*N, C, T/C]
    x1 = x1.reshape(-1, self.chunk_size, self.num_chunks)
    # [B*N, C, T/C] -> [B*N, F, T/C]
    x1 = self.layer_1(x1)
    # [B*N, F, T/C] -> [B*N, F,]
    x1 = self.chunk_proj_1(x1).squeeze(dim=-1)


    #interval sampling
    # [B, T, N] -> [B, C, T/C, N]
    x2 = x.reshape(B, self.chunk_size, self.num_chunks, N)
    x2 = x2.permute(0, 3, 1, 2)
    x2 = x2.reshape(-1, self.chunk_size, self.num_chunks)
    x2 = self.layer_2(x2)
    x2 = self.chunk_proj_2(x2).squeeze(dim=-1)


    x3 = torch.cat([x1, x2], dim=-1) # [B*N, 2*F]


    x3 = x3.reshape(B, N, -1) # [B, N, 2*F]
    x3 = x3.permute(0, 2, 1) # [B, 2*F, N]


    out = self.layer_3(x3) # [B, T_pred, N]


    out = out + highway # [B, T_pred, N]
    return out

IEBlock:

class IEBlock(nn.Module):
    def __init__(self, input_dim, hid_dim, output_dim, num_node):
        super(IEBlock, self).__init__()


        self.input_dim = input_dim # C
        self.hid_dim = hid_dim
        self.output_dim = output_dim # F
        self.num_node = num_node # T/C


        self._build()


    def _build(self):
        self.spatial_proj = nn.Sequential(
            nn.Linear(self.input_dim, self.hid_dim),
            nn.LeakyReLU(),
            nn.Linear(self.hid_dim, self.hid_dim // 4)
        )


        self.channel_proj = nn.Linear(self.num_node, self.num_node)
        torch.nn.init.eye_(self.channel_proj.weight)


        self.output_proj = nn.Linear(self.hid_dim // 4, self.output_dim)


    def forward(self, x):
        # [B*N, C, T/C] -> [B*N, T/C, F']
        x = self.spatial_proj(x.permute(0, 2, 1))
        # [B*N, F', T/C] + [B*N, F', T/C]
        x = x.permute(0, 2, 1) + self.channel_proj(x.permute(0, 2, 1))
        # [B*N, F', T/C] -> [B*N, T/C, F]
        x = self.output_proj(x.permute(0, 2, 1))
        # [B*N, T/C, F] -> [B*N, F, T/C]
        x = x.permute(0, 2, 1)


        return x

Reference materials

[1] Zhang, T., Zhang, Y., Cao, W., Bian, J., Yi, X., Zheng, S., & Li, J. (2022). Less is more: Fast multivariate time series forecasting with light sampling-oriented mlp structures. arXiv preprint arXiv:2207.01186.