Fundamentals of Convolutional Neural Networks

Directory

1. Deep learning platform

1 Introduction

2, PyTorch – basic concepts

2. Convolutional Neural Network Basics

1. Basic concepts

2. The structure of convolutional neural network

3. LeNet-5 network

Summarize

1. Deep learning platform

1, Introduction

Mainly include: TensorFlow, Caffe, JAX, MXNet, Torch/PyTorch, PaddlePaddle, MMdetection. The current mainstream platforms are: TensorFlow and PyTorch. PyTorch users are gradually increasing. Originally PyTorch was developed by Hugh Perkins as a Python wrapper for LusJIT based on the Torch framework. PyTorch is a redesign and implementation of Torch in Python while sharing the same core C library for backend code. This article chooses PyTorch as the platform to learn and develop. You can see the official website for the introductory tutorial: http://https:://www pytorch 123 com/.

2, PyTorch – Basic Concepts

A tensor is an array of values that may have multiple dimensions. A tensor with ? axes corresponds to a mathematical vector (vector); a tensor with two axes corresponds to a mathematical matrix (matrix); a tensor with more than two axes has no special mathematical name. Similar to numpy.

The image of a mathematical calculation is described by a directed graph of “nodes” and “lines”. “Node” is generally used to represent the mathematical operation applied, but can also represent the starting point of data input and the end point of output, or The end point of reading and writing persistent variables. “Line” represents the input-output relationship between “nodes”. These data “lines” can transport “size dynamically adjustable” multi-dimensional data arrays, that is, “tensors” (tensor) .

In short, it includes the following features:

Use tensor to represent data;
Use Dataset, DataLoader to read sample data and tags;
Use variables (Variable) to store parameters such as neural network weights;
Use a computational graph to represent computational tasks;
Computational graphs are executed concurrently during code execution.

Code example:

import torch
x_const = torch.tensor[1.0, 2.0, 3.0]
y = torch.tensor[3.0, 4.0, 5.0]
output = x_const + y
print(x_const,'\\
', y,'\\
', output)

2. Convolutional Neural Network Basics

The concept of convolutional neural network appeared earlier (1989), but it has been paid attention to since the emergence of AlexNet in 2012. This is the first time that the neural network has surpassed the past and the result of its learning.

Fully connected network: too many weights, difficult to calculate and converge, and may enter local minimum values, and is prone to overfitting problems.

In order to solve the problem of reducing weights, only a part of the weights can be linked to calculate the convolution.

1. Basic concepts

A convolution kernel (filter) actually extracts features from the image and processes the image with certain effects.

Filling: That is to fill some values on the border of the matrix to increase the size of the matrix, usually with 0 or copy border pixels to fill. In this way, by increasing the pixel size of the original image, the size of the result after convolution is avoided from changing.

Step size: the distance between two adjacent convolution operations.

Multi-channel convolution: At this time, the convolution kernel not only has height and width but also depth. For example, convolution for RGB images.

Pooling: As a result of the convolutional layer, there are too many features of the extracted image and need to be simplified. Use local statistical features such as mean or maximum. Solve the problem of too many features. Generally, there will be a pooling layer after the convolutional layer.

2. Structure of Convolutional Neural Network

Composition: It consists of multiple convolutional layers and downsampling layers (Pooling), which can be connected to a fully connected network later. Convolution layer: k filters.
Downsampling layer: take mean or max. Back: connected to the fully connected network.

The update of the weight of the entire network can still use the BP algorithm.

3. LeNet-5 network

Network structure:

The purpose of the network is to recognize handwritten digits and classify them, and the function made is classification.

The first convolutional layer has 6 channels, and the total parameters are: (5*5 + 1)*6. After that is the pooling layer, which does average pooling. The second convolution layer has 16 channels, each convolution kernel height: 5, width: 5, depth: 6, the total parameters are: (5*5*6 + 1)*16, followed by one Average pooling layer. Behind is the fully connected network (Multilayer perception), whose input needs to flatten the image and turn it into a vector. It contains 2 hidden layers, the number of neurons is 120 and 84, and the number of output layers is 10. Each layer will have a non-linear activation function. Number of parameters to the first hidden layer: (5*5*16 + 1)*120, to the second hidden layer: (120 + 1)*84, to the output layer: (84 + 1)*10 .

There are many trick numbers to build a neural network in practice, and many parameters are chosen based on experience or intuition, and it is difficult to explain the reason.

The difference from the current network
– No padding is performed during convolution
– The pooling layer uses average pooling instead of max pooling
-Choose Sigmoid or tanh instead of ReLU as the nonlinear link activation function
– The number of layers is shallow and the number of parameters is small (about 60,000)

As the network deepens, wide and high attenuation, the number of channels increases. The classic BP algorithm is still used for weight update.

The pooling layer returns the error to the convolutional layer according to whether it is average pooling or maximum pooling.

The error is passed back from the convolutional layer to the pooling layer:

From convolutional layer to pooling layer

Visualization of this network: https://adamharley.com/nn_vis/cnn/3d.html

For specific implementation, please refer to hands-on deep learning.

import torch
from torch import nn
from d2l import torch as d2l

net = nn. Sequential(
    nn.Conv2d(1, 6, kernel_size=5, padding=2), nn.Sigmoid(),
    nn.AvgPool2d(kernel_size=2, stride=2),
    nn.Conv2d(6, 16, kernel_size=5), nn.Sigmoid(),
    nn.AvgPool2d(kernel_size=2, stride=2),
    nn. Flatten(),
    nn.Linear(16 * 5 * 5, 120), nn.Sigmoid(),
    nn.Linear(120, 84), nn.Sigmoid(),
    nn. Linear(84, 10))

X = torch.rand(size=(1, 1, 28, 28), dtype=torch.float32)
for layer in net:
    X = layer(X)
    print(layer.__class__.__name__,'output shape: \t',X.shape)

def evaluate_accuracy_gpu(net, data_iter, device=None): #@save
    """Use the GPU to calculate the accuracy of the model on the dataset"""
    if isinstance(net, nn.Module):
        net.eval() # set to evaluation mode
        if not device:
            device = next(iter(net.parameters())).device
    # number of correct predictions, total number of predictions
    metric = d2l. Accumulator(2)
    with torch.no_grad():
        for X, y in data_iter:
            if isinstance(X, list):
                # Required for BERT fine-tuning (will be introduced later)
                X = [x.to(device) for x in X]
            else:
                X = X.to(device)
            y = y.to(device)
            metric. add(d2l. accuracy(net(X), y), y. numel())
    return metric[0] / metric[1]


def train_ch6(net, train_iter, test_iter, num_epochs, lr, device):
    """Using GPU training model (defined in Chapter 6)"""
    def init_weights(m):
        if type(m) == nn.Linear or type(m) == nn.Conv2d:
            nn.init.xavier_uniform_(m.weight)
    net.apply(init_weights)
    print('training on', device)
    net.to(device)
    optimizer = torch.optim.SGD(net.parameters(), lr=lr)
    loss = nn.CrossEntropyLoss()
    animator = d2l.Animator(xlabel='epoch', xlim=[1, num_epochs],
                            legend=['train loss', 'train acc', 'test acc'])
    timer, num_batches = d2l. Timer(), len(train_iter)
    for epoch in range(num_epochs):
        # sum of training loss, sum of training accuracy, number of samples
        metric = d2l. Accumulator(3)
        net. train()
        for i, (X, y) in enumerate(train_iter):
            timer. start()
            optimizer. zero_grad()
            X, y = X.to(device), y.to(device)
            y_hat = net(X)
            l = loss(y_hat, y)
            l. backward()
            optimizer. step()
            with torch.no_grad():
                metric. add(l * X. shape[0], d2l. accuracy(y_hat, y), X. shape[0])
            timer. stop()
            train_l = metric[0] / metric[2]
            train_acc = metric[1] / metric[2]
            if (i + 1) % (num_batches // 5) == 0 or i == num_batches - 1:
                animator.add(epoch + (i + 1) / num_batches,
                             (train_l, train_acc, None))
        test_acc = evaluate_accuracy_gpu(net, test_iter)
        animator.add(epoch + 1, (None, None, test_acc))
    print(f'loss {train_l:.3f}, train acc {train_acc:.3f}, '
          f'test acc {test_acc:.3f}')
    print(f'{metric[2] * num_epochs / timer.sum():.1f} examples/sec'
          f'on {str(device)}')


lr, num_epochs = 0.9, 10
train_ch6(net, train_iter, test_iter, num_epochs, lr, d2l.try_gpu())

Summary

Introduces the basic concept and structure of convolutional neural network, and specifically introduces a classic neural network structure.

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge algorithm skill treeHome pageOverview 41884 people are studying systematically