Transformer model training structure analysis (to deepen understanding)

Some feelings about running the project:
Many times, the execution process of the program of an overall deep learning project needs to be sorted out. Often multiple modules are nested layer by layer, and then the order of execution also jumps between multiple Python function modules. Sometimes behind a short line of code in a certain program file (such as self._train_model(training_loader)) there may be hundreds of lines of functional code running a function under another class. Mainly You need to understand the nesting and modular thinking in the model, and then master the ability to draw inferences from one example.

Python basics often used in deep learning:

class BaseImputer(BaseModel):
    """Abstract class for all imputation models."""

    def __init__(self, device):
        super().__init__(device)

    @abstractmethod
    def fit(self, train_X, val_X=None):
        """Train the imputer.

        Parameters
        ----------
        train_X : array-like, shape: [n_samples, sequence length (time steps), n_features],
            Time-series data for training, can contain missing values.
        val_X : array-like, optional, shape [n_samples, sequence length (time steps), n_features],
            Time-series data for validating, can contain missing values.

        Returns
        -------
        self : object,
            Trained imputer.
        """
        return self # self represents the instance object of the class,
        #The return self statement at the end of the code means returning the trained model object.
        #In other words, when the fit method is called for training, the method will return a trained BaseImputer object.
        #This object can be used for subsequent predictions or other operations

Definition of model
In the common code below, self.model is a member variable of the BaseNNImputer class, which represents an instance object of a machine learning model (such as a neural network model, regression model, etc.).

class BaseNNImputer(BaseNNModel, BaseImputer):
    def __init__(
        self, learning_rate, epochs, patience, batch_size, weight_decay, device
    ):
        super().__init__(
            learning_rate, epochs, patience, batch_size, weight_decay, device
        )

    @abstractmethod
    def assemble_input_data(self, data):
        pass

    def _train_model(
        self,
        training_loader,#text_emb,
        val_loader=None,
        val_X_intact=None,
        val_indicating_mask=None,
    ):
        self.optimizer = torch.optim.Adam(
            self.model.parameters(), lr=self.lr, weight_decay=self.weight_decay
        )

        # each training starts from the very beginning, so reset the loss and model dict here
        self.best_loss = float("inf")
        self.best_model_dict = None

        try:
            for epoch in range(self.epochs):
                self.model.train() #self.model is a member variable of the class BaseNNImputer
                epoch_train_loss_collector = []
                ........

How to specifically define a deep learning model?

self.model = _TransformerEncoder(

            self.n_layers,

            self.n_steps,

            self.n_features,

            self.d_model,

            self.d_inner,

            self.n_head,

            self.d_k,

            self.d_v,

            self.dropout,

            self.ORT_weight,

            self.MIT_weight,

        )

Model training

class Transformer(BaseNNImputer):

    def __init__(

        self,

        n_steps,

        n_features,

        n_layers,

        d_model,

        d_inner,

        n_head,

        d_k,

        d_v,

        dropout,

        ORT_weight=1,

        MIT_weight=1,

        learning_rate=3e-4,#3e-4, # 1e-3

        epochs=100,

        patience=10,

        batch_size=32,

        weight_decay=1e-5,#1e-5, #1e-5 best: 1e-4

        device=None,

    ):

        super().__init__(

            learning_rate, epochs, patience, batch_size, weight_decay, device

        )



        self.n_steps = n_steps

        self.n_features = n_features

        # model hype-parameters

        self.n_layers = n_layers

        self.d_model = d_model

        self.d_inner = d_inner

        self.n_head = n_head

        self.d_k = d_k

        self.d_v = d_v

        self.dropout = dropout

        self.ORT_weight = ORT_weight

        self.MIT_weight = MIT_weight



        self.model = _TransformerEncoder(

            self.n_layers,

            self.n_steps,

            self.n_features,

            self.d_model,

            self.d_inner,

            self.n_head,

            self.d_k,

            self.d_v,

            self.dropout,

            self.ORT_weight,

            self.MIT_weight,

        )

        self.model = self.model.to(self.device)

        self._print_model_size()



    def fit(self, train_X,x_vectors,text_emb,labels, val_X=None):

        train_X = self.check_input(self.n_steps, self.n_features, train_X)

        if val_X is not None:

            val_X = self.check_input(self.n_steps, self.n_features, val_X)



        training_set = DatasetForMIT(train_X,x_vectors,text_emb,labels)

        training_loader = DataLoader(

            training_set, batch_size=self.batch_size, shuffle=True

        )

        if val_X is None:

            self._train_model(training_loader)

        else:

            val_X_intact, val_X, val_X_missing_mask, val_X_indicating_mask = mcar(

                val_X, 0.2

            )

            val_X = masked_fill(val_X, 1 - val_X_missing_mask, np.nan)

            val_set = DatasetForMIT(val_X)

            val_loader = DataLoader(val_set, batch_size=self.batch_size, shuffle=False)

            self._train_model(

                training_loader, val_loader, val_X_intact, val_X_indicating_mask

            ) #This line of code calls a method named _train_model and passes in training_loader as a parameter
            #Based on the context of the code snippet, it can be inferred that the _train_model method is used to perform model training operations.
            #It receives as input a data loader training_loader, which provides batch samples of training data.
            #The specific training operation will be defined in the implementation of the _train_model method.
            



        self.model.load_state_dict(self.best_model_dict) #This line of code uses the load_state_dict method to load a state dictionary named best_model_dict into self.model
        #The state dictionary contains the parameters and weight information of the model. By loading the state dictionary, the parameters of the model can be set to the parameters of the best saved model.
        #The purpose of this is to restore the model to the best performing state during training for subsequent evaluation or inference.

        self.model.eval() # set the model as eval status to freeze it.
        #This line of code sets the state of the model to the evaluation state, that is, executes the eval() method.
        #In the evaluation state, the behavior of the model changes, mainly to freeze some layers and specific operations to ensure that the results of the model are not affected during the inference or evaluation phase. For example, some operations with randomness (such as Dropout) may be turned off to maintain consistent output.

        return self #Returning self means using the trained model object as the return value of the method.

About model training and evaluation methods

self.model.train() and self.model.eval() are commonly used methods in deep learning models and are used to set the training and evaluation status of the model.

train() method: Call the train() method to set the model to training status. In the training state, the model will enable some random operations, such as Dropout and Batch Normalization updates. These stochastic operations help the model learn better generalization capabilities and prevent overfitting. In addition, the gradient of the model will be calculated and updated during training to optimize parameters.
eval() method: Call the eval() method to set the model to evaluation status. In the evaluation state, the model disables some random operations to ensure that the model’s output is stable and repeatable. For example, the Dropout operation will be turned off, and the Batch Normalization layer will use statistics during training instead of statistics calculated in real time. This results in consistent output that can be used during the inference, validation, or testing phases of the model.

It should be noted that the train() and eval() methods are commonly used for model objects in deep learning frameworks such as PyTorch. These methods are called to set the internal state of the model to perform different operations depending on the current training or evaluation phase. This ensures consistent behavior of the model at different stages and improves the stability and reproducibility of the model.