2. softmax regression

Learning video:

09 Softmax regression + loss function + image classification data set [hands-on learning deep learning v2]_bilibili_bilibili

http://localhost:8888/notebooks/chapter_linear-networks/softmax-regression-scratch.ipynb

Table of Contents

1. Loss function

1.Mean square loss:

2. Absolute value loss function:

?Edit 3.Huber’s Robust Loss?Edit?Edit

2. Softmax implementation:

1. Initialize model parameters:

2. Define softmax operation:

3. Define the model:

4. Implement the cross-entropy loss function:

5. Classification accuracy:

6. Training:

(1) Define a training cycle:

(2) Draw the training process:

(3) Entire function:

(4) Mini-batch stochastic gradient descent is used to optimize the loss function of the model:

(5) Ten cycles of training:

(6) Prediction:

?3. Simple implementation:

1. Initialize model parameters:

2. Pass in the cross-entropy loss function as the normalized prediction and calculate the softmax and its logarithm simultaneously

3. Optimization algorithm:

3. Training:


Note: Classification model,

For details on reading data, see the courseware: http://localhost:8888/notebooks/chapter_linear-networks/image-classification-dataset.ipynb

1. Loss function

1.Mean square loss:

When deriving the derivative, dividing by 2 and 2 squared cancel

2.Absolute value loss function:

3.Huber’s Robust Loss

2. Softmax implementation:

1.Initialize model parameters:

[will flatten each image, treating them as vectors of length 784. ] In later chapters, we will discuss features that exploit the spatial structure of an image, but for now we will just consider each pixel location as a feature.

(Because our dataset has 10 categories, the network output dimension is 10). Therefore, the weights will form a 784×10 matrix and the biases will form a 1×10 row vector. As with linear regression, we will initialize our weights W using a normal distribution, with biases initialized to 0.

num_inputs = 784
num_outputs = 10

W = torch.normal(0, 0.01, size=(num_inputs, num_outputs), requires_grad=True)
b = torch.zeros(num_outputs, requires_grad=True)

2. Define softmax operation:

def softmax(X):
    X_exp = torch.exp(X)
    partition = X_exp.sum(1, keepdim=True)
    return X_exp / partition # The broadcast mechanism is applied here

3. Define the model:

def net(X):
    return softmax(torch.matmul(X.reshape((-1, W.shape[0])), W) + b)

4. Implement the cross-entropy loss function:

def cross_entropy(y_hat, y):
    return - torch.log(y_hat[range(len(y_hat)), y])

cross_entropy(y_hat, y)

5.Classification accuracy:

Compare the predicted category to the true category of y

def accuracy(y_hat, y): #@save
    """Calculate the number of correct predictions"""
    if len(y_hat.shape) > 1 and y_hat.shape[1] > 1:
        y_hat = y_hat.argmax(axis=1)
    cmp = y_hat.type(y.dtype) == y
    return float(cmp.type(y.dtype).sum())

Likewise, [we can evaluate the accuracy of any model net] for a dataset accessible to any data iterator data_iter.

def evaluate_accuracy(net, data_iter): #@save
    """Calculate the accuracy of the model on the specified data set"""
    if isinstance(net, torch.nn.Module):
        net.eval() # Set the model to evaluation mode
    metric = Accumulator(2) # Number of correct predictions, total number of predictions
    with torch.no_grad():
        for X, y in data_iter:
            metric.add(accuracy(net(X), y), y.numel())
    return metric[0] / metric[1]

A utility class Accumulator is defined here to accumulate multiple variables. In the above evaluate_accuracy function, we created 2 variables in the (Accumulator instance, which are used to store the number of correct predictions and the total number of predictions< /strong>). Both will be accumulated over time as we iterate over the dataset.

class Accumulator: #@save
    """Accumulate on n variables"""
    def __init__(self, n):
        self.data = [0.0] * n

    def add(self, *args):
        self.data = [a + float(b) for a, b in zip(self.data, args)]

    def reset(self):
        self.data = [0.0] * len(self.data)

    def __getitem__(self, idx):
        return self.data[idx]

6.Training:

(1) Define a training cycle:

def train_epoch_ch3(net, train_iter, loss, updater): #@save
    """Train the model for one iteration cycle (see Chapter 3 for definition)"""
    # Set the model to training mode
    if isinstance(net, torch.nn.Module):
        net.train()
    # Sum of training losses, sum of training accuracy, number of samples
    metric = Accumulator(3)
    for X, y in train_iter:
        # Calculate gradient and update parameters
        y_hat = net(X)
        l = loss(y_hat, y)
        if isinstance(updater, torch.optim.Optimizer):
            # Use PyTorch’s built-in optimizer and loss function
            updater.zero_grad()
            l.mean().backward()
            updater.step()
        else:
            # Use custom optimizer and loss function
            l.sum().backward()
            updater(X.shape[0])
        metric.add(float(l.sum()), accuracy(y_hat, y), y.numel())
    # Return training loss and training accuracy
    return metric[0] / metric[2], metric[1] / metric[2]

(2) Draw the training process:

class Animator: #@save
    """Draw data in animation"""
    def __init__(self, xlabel=None, ylabel=None, legend=None, xlim=None,
                 ylim=None, xscale='linear', yscale='linear',
                 fmts=('-', 'm--', 'g-.', 'r:'), nrows=1, ncols=1,
                 figsize=(3.5, 2.5)):
        # Draw multiple lines incrementally
        if legend is None:
            legend = []
        d2l.use_svg_display()
        self.fig, self.axes = d2l.plt.subplots(nrows, ncols, figsize=figsize)
        if nrows * ncols == 1:
            self.axes = [self.axes, ]
        # Use lambda function to capture parameters
        self.config_axes = lambda: d2l.set_axes(
            self.axes[0], xlabel, ylabel, xlim, ylim, xscale, yscale, legend)
        self.X, self.Y, self.fmts = None, None, fmts

    def add(self, x, y):
        # Add multiple data points to the chart
        if not hasattr(y, "__len__"):
            y = [y]
        n = len(y)
        if not hasattr(x, "__len__"):
            x = [x] * n
        if not self.X:
            self.X = [[] for _ in range(n)]
        if not self.Y:
            self.Y = [[] for _ in range(n)]
        for i, (a, b) in enumerate(zip(x, y)):
            if a is not None and b is not None:
                self.X[i].append(a)
                self.Y[i].append(b)
        self.axes[0].cla()
        for x, y, fmt in zip(self.X, self.Y, self.fmts):
            self.axes[0].plot(x, y, fmt)
        self.config_axes()
        display.display(self.fig)
        display.clear_output(wait=True)

(3) Entire function:

def train_ch3(net, train_iter, test_iter, loss, num_epochs, updater): #@save
    """Training model (see Chapter 3 for definition)"""
    animator = Animator(xlabel='epoch', xlim=[1, num_epochs], ylim=[0.3, 0.9],
                        legend=['train loss', 'train acc', 'test acc'])
    for epoch in range(num_epochs):
        train_metrics = train_epoch_ch3(net, train_iter, loss, updater)
        test_acc = evaluate_accuracy(net, test_iter)
        animator.add(epoch + 1, train_metrics + (test_acc,))
    train_loss, train_acc = train_metrics
    assert train_loss < 0.5, train_loss
    assert train_acc <= 1 and train_acc > 0.7, train_acc
    assert test_acc <= 1 and test_acc > 0.7, test_acc

(4) Mini-batch stochastic gradient descent to optimize the loss function of the model:

lr = 0.1

def updater(batch_size):
    return d2l.sgd([W, b], lr, batch_size)

(5) Ten cycles of training:

num_epochs = 10
train_ch3(net, train_iter, test_iter, cross_entropy, num_epochs, updater)

(6) Prediction:

def predict_ch3(net, test_iter, n=6): #@save
    """Predicted labels (see Chapter 3 for definition)"""
    for X, y in test_iter:
        break
    trues = d2l.get_fashion_mnist_labels(y)
    preds = d2l.get_fashion_mnist_labels(net(X).argmax(axis=1))
    titles = [true + '\\
' + pred for true, pred in zip(trues, preds)]
    d2l.show_images(
        X[0:n].reshape((n, 28, 28)), 1, n, titles=titles[0:n])

predict_ch3(net, test_iter)

3. Simple implementation:

http://localhost:8888/notebooks/chapter_linear-networks/softmax-regression-concise.ipynb

1. Initialize model parameters:

As we described in :numref:sec_softmax, [the output layer of softmax regression is a fully connected layer]. Therefore, to implement our model, we simply add a fully connected layer with 10 outputs in Sequential. Again, Sequential is not necessary here, but it is the basis for implementing deep models. We still randomly initialize the weights with mean 0 and standard deviation 0.01.

# PyTorch does not reshape input implicitly. therefore,
# We define a flattening layer (flatten) before the linear layer to adjust the shape of the network input
net = nn.Sequential(nn.Flatten(), nn.Linear(784, 10))

def init_weights(m):
    if type(m) == nn.Linear:
        nn.init.normal_(m.weight, std=0.01)

net.apply(init_weights);

2. Pass as a normalized prediction in the cross-entropy loss function, and simultaneously calculate the softmax and Its logarithm

loss = nn.CrossEntropyLoss(reduction='none')

3. Optimization algorithm:

Here we (use mini-batch stochastic gradient descent with a learning rate of 0.1 as the optimization algorithm). This is the same as in our linear regression example, which illustrates the generality of the optimizer.

trainer = torch.optim.SGD(net.parameters(), lr=0.1)

3.Training:

Call a previously defined function

num_epochs = 10
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer)

The knowledge points of the article match the official knowledge archives, and you can further learn relevant knowledge. Python introductory skill treeArtificial intelligenceSupervised learning based on Python 376,905 people are learning the system

syntaxbug.com © 2021 All Rights Reserved.