[PyTorch Guide (3/7)] Linear components, activation functions

1. Description

A neural network is a collection of neurons connected by layers. Each neuron is a small computing unit that performs simple calculations to collectively solve problems. They are organized by layers. There are three types of layers: input layer, hidden layer, and output layer. Each layer contains many neurons, except the input layer. Neural networks mimic the way the human brain processes information.

2. Components of neural network

The activation function determines whether the neuron should be activated. The computation that occurs in a neural network involves applying activation functions. If a neuron is activated, then that means the input is important. There are different kinds of activation functions. Choosing which activation function to use depends on what you want to output. Another important role of activation functions is to add nonlinearity to the model.
– Binary is used to set the output node to 1 if the function result is positive and 0 if the function result is negative.
– Sigmoid is used to predict the probability that the output node is between 0 and 1.
– Tanh is used to predict whether the output node is between 1 and -1. Used for classifying use cases.
– ReLU is used to set the output node to 0 when the function result is negative, and retain the result value if the result is positive.
Weights affect how close the output of our network is to the expected output value. When the input enters the neuron, it is multiplied by the weight value and the resulting output is observed or passed to the next layer in the neural network. The weights of all neurons in the layer are organized into a tensor.
Bias makes up the difference between the activation function’s output and its expected output. Low bias indicates that the network makes more assumptions about the output form, while high bias makes fewer assumptions about the output form.

We can say that the output y of a neural network layer with weights W and bias b is calculated as the sum of the inputs multiplied by the weights plus the bias.
x = ∑(weight?input) + bias, where f(x) is the activation function.

3. Building a neural network

Neural networks consist of layers/modules that perform operations on data. The torch.nn namespace provides all the building blocks needed to build your own neural network. Every module in PyTorch subclassesnn . Module. A neural network itself is a module composed of other modules (layers). This nested structure allows complex architectures to be easily built and managed.

Here, we will build a neural network to classify images in the FashionMNIST dataset.

%matplotlib inline
import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

4. Define classes

We define the neural network by subclassing nn.Module and starting the neural network layer in __init__. Each nn.Module subclass implements operations on input data in forwarding methods.

Our neural network consists of the following:
– Input layer with 28×28 or 784 features/pixel.
– The first linear module takes 1 feature of the input and converts it into a hidden layer with 784 features.
– ReLU activation function will be applied to the transformation.
– The second linear module takes 512 features as input from the first hidden layer and converts it to the next hidden layer with 2 features.
– ReLU activation function will be applied to the transformation.
– The 512th linear module takes as input 512 features from the 1st hidden layer and converts it into an output layer with 3 (number of classes).
– ReLU activation function will be applied to the transformation.

class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
            nn.ReLU()
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

We create an instance of NeuralNetwork, move it to Device, and print its structure.

model = NeuralNetwork().to(device)
print(model)

NeuralNetwork(
  (flatten): Flatten()
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
    (5): ReLU()
  )
)

To use the model, we pass it input data. This will perform the model’s forward functions as well as some background operations. However, don’t call model.forward() directly! Calling the model on the input returns a 10-dimensional tensor containing the raw predictions for each class. We get the predicted densityby passing it through an instance of nn . Soft Max.

X = torch.rand(1, 28, 28, device=device)
logits = model(X)
pred_probab = nn.Softmax(dim=1)(logits)
y_pred = pred_probab.argmax(1)
print(f"Predicted class: {y_pred}")

Predicted class: tensor([2], device='cuda:0')

Let’s break down the layers in the FashionMNIST model. To illustrate it, we’ll take a sample mini-batch of 3 images of size 28×28 and see what happens when we pass it over the network.

input_image = torch.rand(3,28,28)
print(input_image.size())

torch.Size([3, 28, 28])

4.1 nn.flatten

We initializenn . Flattenlayers to convert each 2D 28×28 image into a contiguous array of 784 pixel values (maintaining the mini-batch size (at dim=0)). Each pixel is passed to the input layer of the neural network.

flatten = nn.Flatten()
flat_image = flatten(input_image)
print(flat_image.size())

torch.Size([3, 784])

4.2 nn.linear

A linear layer is a module that applies a linear transformation to the input using its stored weights and biases. The gray value of each pixel in the input layer will be connected to the neurons in the hidden layer for transformation calculation, that is, weight * input + bias.

layer1 = nn.Linear(in_features=28*28, out_features=20)
hidden1 = layer1(flat_image)
print(hidden1.size())

torch.Size([3, 20])

4.3 nn.relu

Nonlinear activation is what creates complex mappings between the inputs and outputs of the model. They are applied to introduce nonlinearity after linear transformation to help neural networks learn a variety of phenomena. In this model, we usenn . ReLU between linear layers, but there are other activations that introduce nonlinearity in the model.

The ReLU activation function takes the output from the linear layer and replaces negative values with zeros.

print(f"Before ReLU: {hidden1}\
\
")
hidden1 = nn.ReLU()(hidden1)
print(f"After ReLU: {hidden1}")

Before ReLU: tensor([[ 0.2190, 0.1448, -0.5783, 0.1782, -0.4481, -0.2782, -0.5680, 0.1347,
          0.1092, -0.7941, -0.2273, -0.4437, 0.0661, 0.2095, 0.1291, -0.4690,
          0.0358, 0.3173, -0.0259, -0.4028],
        [-0.3531, 0.2385, -0.3172, -0.4717, -0.0382, -0.2066, -0.3859, 0.2607,
          0.3626, -0.4838, -0.2132, -0.7623, -0.2285, 0.2409, -0.2195, -0.4452,
         -0.0609, 0.4035, -0.4889, -0.4500],
        [-0.3651, -0.1240, -0.3222, -0.1072, -0.0112, -0.0397, -0.4105, -0.0233,
         -0.0342, -0.5680, -0.4816, -0.8085, -0.3945, -0.0472, 0.0247, -0.3605,
         -0.0347, 0.1192, -0.2763, 0.1447]], grad_fn=<AddmmBackward>)


After ReLU: tensor([[0.2190, 0.1448, 0.0000, 0.1782, 0.0000, 0.0000, 0.0000, 0.1347, 0.1092,
         0.0000, 0.0000, 0.0000, 0.0661, 0.2095, 0.1291, 0.0000, 0.0358, 0.3173,
         0.0000, 0.0000],
        [0.0000, 0.2385, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.2607, 0.3626,
         0.0000, 0.0000, 0.0000, 0.0000, 0.2409, 0.0000, 0.0000, 0.0000, 0.4035,
         0.0000, 0.0000],
        [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0247, 0.0000, 0.0000, 0.1192,
         0.0000, 0.1447]], grad_fn=<ReluBackward0>)

4.4 nn.Sequential

nnSequential is an ordered container for modules. Data is passed to all modules in the same order defined. You can use sequential containers to group together fast networks (such as seq_modules).

seq_modules = nn.Sequential(
    flatten,
    layer1,
    nn.ReLU(),
    nn.Linear(20, 10)
)
input_image = torch.rand(3,28,28)
logits = seq_modules(input_image)

4.5 nn.software

The last linear layer of the neural network returns the logarithm – in [-infty, infty]
Primitive values that are passed tonn . Softmaxmodule. The Softmax activation function is used to calculate the probability of the neural network output. It is only used in the output layer of neural networks. The results are scaled to a value [0,1] representing the model’s predicted density for each class. The dim parameter indicates the dimension for which the resulting values must sum to 1. The node with the highest probability predicts the desired output.

softmax = nn.Softmax(dim=1)
pred_probab = softmax(logits)

5. Model parameters

Many layers in a neural network are parameterized, that is, have associated weights and biases, which are optimized during training. Subclassing the nn.module automatically keeps track of all fields defined in the model object and uses the model’s parameters() or named_parameter() method Access all parameters.

print("Model structure: ", model, "\
\
")

for name, param in model.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values: {param[:2]} \
")

Next>> Introduction to PyTorch (4/7)

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge. Python entry skill treeArtificial intelligenceDeep learning 350763 people are learning the system