Convolutional neural network implements coffee bean classification – P7

  • This article isThe learning record blog in the 365-day deep learning training camp
  • Original author: Classmate K | Tutoring, project customization
  • Article source: Student K’s study circle

Directory

  • environment
  • step
    • Environment settings
      • package reference
      • global device object
    • data preparation
      • View image information
      • Make a dataset
    • Model design
      • Manually built vgg16 network
      • Streamlined coffee bean identification network
    • Model training
      • Write training function
      • Write test function
      • Start training
      • Show training process
    • Model effect display
  • Summary and experience

Environment

  • System: Linux
  • Language: Python3.8.10
  • Deep learning framework: Pytorch2.0.0 + cu118
  • Graphics card: A5000 24G

Steps

Environment settings

Package reference

import torch
import torch.nn as nn # network
import torch.optim as optim # optimizer
from torch.utils.data import DataLoader, random_split #Data set division
from torchvision import datasets, transforms #Dataset loading, transformation

import pathlib, random, copy # Folder traversal to implement model deep copy
from PIL import Image # python’s own image class
import matplotlib.pyplot as plt # chart
import numpy as np
from torchinfo import summary #Print model parameters

Global device object

Conveniently copy models and data to the target device

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

Data preparation

View image information

data_path = 'coffee_data'
data_lib = pathlib.Path(data_path)
coffee_images = list(data_lib.glob('*/*'))

# Print information about 5 images
for _ in range(5):
image = random.choice(coffee_images)
print(np.array(Image.open(str(image))).shape)

Image information
From the printed information, we can see that the size of the images are all 224×224, which is an image size often used by CV, so I will not use Resize to scale the image later.

# Print 20 images for a rough look
plt.figure(figsize=(20, 4))
for i in range(20):
plt.subplot(2, 10, i + 1)
plt.axis('off')
image = random.choice(coffee_images) # Randomly select an image
plt.title(image.parts[-2]) # Get its folder name through the glob object, which is the category name
plt.imshow(Image.open(str(image))) # Display

Dataset preview
Through the display, you can have a general understanding of the images in the data set.

Create a data set

First write the data preprocessing process to load the images in the folder using pytorch’s API

transform = transforms.Compose([
transforms.ToTensor(), # First convert the image into a tensor
transforms.Normalize( # Normalize the pixel values and set the data range to -1,1
       mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225],
),
])

Load folder

dataset = datasets.ImageFolder(data_path, transform=transform)

Get all category names from the data

class_names = [k for k in dataset.class_to_idx]
print(class_names)

Image category name
Divide the data set into a training set and a validation set

train_size = int(len(dataset) * 0.8)
test_size = len(dataset) - train_size

train_dataset, test_dataset = random_split(dataset, [train_size, test_size])

Divide the dataset into batches to use mini-batch gradient descent

batch_size = 32
train_loader = DataLoader(train_dataset, shuffle=True, batch_size=batch_size)
test_loader = DataLoader(test_dataset, batch_size=batch_size)

Model design

At the beginning, I created the Vgg-16 network manually and found that the model converged after a few iterations, so I started to streamline the model.

Manually built vgg16 network

class Vgg16(nn.Module):
def __init__(self, num_classes):
super().__init__()
\t\t
self.block1 = nn.Sequential(
nn.Conv2d(3, 64, 3, padding=1),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.Conv2d(64, 64, 3, padding=1),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(2),
)
self.block2 = nn.Sequential(
nn.Conv2d(64, 128, 3, padding=1),
nn.BatchNorm2d(128),
nn.ReLU(),
nn.Conv2d(128, 128, 3, padding=1),
nn.BatchNorm2d(128),
nn.ReLU(),
nn.MaxPool2d(2),
)
self.block3 = nn.Sequential(
nn.Conv2d(128, 256, 3, padding=1),
nn.BatchNorm2d(256),
nn.ReLU(),
nn.Conv2d(256, 256, 3, padding=1),
nn.BatchNorm2d(256),
nn.ReLU(),
nn.Conv2d(256, 256, 3, padding=1),
nn.BatchNorm2d(256),
nn.ReLU(),
nn.MaxPool2d(2),
)
self.block4 = nn.Sequential(
nn.Conv2d(256, 512, 3, padding=1),
nn.BatchNorm2d(512),
nn.ReLU(),
nn.Conv2d(512, 512, 3, padding=1),
nn.BatchNorm2d(512),
nn.ReLU(),
nn.Conv2d(512, 512, 3, padding=1),
nn.BatchNorm2d(512),
nn.ReLU(),
nn.MaxPool2d(2),
)
self.block5 = nn.Sequential(
nn.Conv2d(512, 512, 3, padding=1),
nn.BatchNorm2d(512),
nn.ReLU(),
nn.Conv2d(512, 512, 3, padding=1),
nn.BatchNorm2d(512),
nn.ReLU(),
nn.Conv2d(512, 512, 3, padding=1),
nn.BatchNorm2d(512),
nn.ReLU(),
nn.MaxPool2d(2),
)
self.pool = nn.AdaptiveAvgPool2d(7)
self.classifier = nn.Sequential(
nn.Linear(7*7*512, 4096),
nn.Dropout(0.5),
nn.ReLU(),
nn.Linear(4096, 4096),
nn.Dropout(0.5),
nn.ReLU(),
nn.Linear(4096, num_classes),
)

def forward(self, x):
x = self.block1(x)
x = self.block2(x)
x = self.block3(x)
x = self.block4(x)
x = self.block5(x)
x = self.pool(x)
x = x.view(x.size(0),-1)
x = self.classifier(x)
return x
vgg = Vgg16(len(class_names)).to(device)
summary(vgg, input_size=(32, 3, 224, 224))

VGG16 model
By printing the model structure, we can find that the VGG-16 network has a total of 134285380 trainable parameters (I added BatchNorm, which is slightly more than the official one). The amount of parameters is very huge. For a small scene such as coffee bean identification, so Multiple trainable parameters are definitely a waste, so the original VGG-16 network structure was streamlined.

Simplified coffee bean identification network

class Network(nn.Module):
def __init__(self, num_classes):
super().__init__()
\t\t
self.block1 = nn.Sequential(
nn.Conv2d(3, 64, 3, padding=1),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.Conv2d(64, 64, 3, padding=1),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(2),
)

self.block2 = nn.Sequential(
nn.Conv2d(64, 128, 3, padding=1),
nn.BatchNorm2d(128),
nn.ReLU(),
nn.Conv2d(128, 128, 3, padding=1),
nn.BatchNorm2d(128),
nn.ReLU(),
nn.MaxPool2d(2),
)
\t\t
self.block3 = nn.Sequential(
nn.Conv2d(128, 64, 3, padding=1),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.Conv2d(64, 64, 3, padding=1),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(2),
)

self.pool = nn.AdaptiveAvgPool2d(7),

self.classifier = nn.Sequential(
nn.Linear(7*7*64, 64),
nn.Dropout(0.4),
nn.ReLU(),
nn.Linear(64, num_classes)
)
\t
def forward(self, x):
x = self.block1(x)
x = self.block2(x)
x = self.block3(x)
x = self.pool(x)
x = x.view(x.size(0), -1)
x = self.classifier(x)
return x
model = Network(len(class_names)).to(device)
summary(model, input_size=(32, 3, 224, 224))

Streamlined model
It can be seen that the number of parameters of the streamlined network model is less than 1/10 of the original, but its accuracy on the test set can still reach 100%!

Model training

Write training function

def train(train_loader, model, loss_fn, optimizer):
size = len(train_loader.dataset)
num_batches = len(train_loader)

train_loss, train_acc = 0, 0
for x, y in train_loader:
x, y = x.to(device), y.to(device)
\t
pred = model(x)
loss = loss_fn(pred, y)

optimizer.zero_grad()
loss.backward()
optimizer.step()

train_loss + = loss.item()
train_acc + = (pred.argmax(1) == y).type(torch.float).sum().item()

train_loss /= num_batches
train_acc /= size

return train_loss, train_acc

Write test functions

def test(test_loader, model, loss_fn):
size = len(test_loader.dataset)
num_batches = len(test_loader)
\t
test_loss, test_acc = 0, 0
for x, y in test_loader:
x, y = x.to(device), y.to(device)

pred = model(x)
loss = loss_fn(pred, y)

test_loss + = loss.item()
test_acc + = (pred.argmax(1) == y).type(torch.float).sum().item()

test_loss /= num_batches
test_acc /= size

return test_loss, test_acc

Start training

First define the loss function, and the optimizer sets the learning rate. Here we also get the attenuation of the learning rate, plus the total number of iterations, and the storage location of the best model.

epochs = 30
loss_fn = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-4)
scheduler = optim.lr_scheduler.LambdaLR(optimizer=optimizer, lr_lambda=lambda epoch: 0.92**(epoch//2))
best_model_path = 'best_coffee_model.pth'

Then write a training + test loop and record the data during the training process

best_acc = 0
train_loss, train_acc = [], []
test_loss, test_acc = [], []
for epoch in epochs:
model.train()
epoch_train_loss, epoch_train_acc = train(train_loader, model, loss_fn, optimizer)
scheduler.step()

model.eval()
with torch.no_grad();
epoch_test_loss, epoch_test_acc = test(test_loader, model, loss_fn)
\t
train_loss.append(epoch_train_loss)
train_acc.append(epoch_train_acc)
test_loss.append(epoch_test_loss)
test_acc.append(epoch_test_acc)

lr = optimizer.state_dict()['param_groups'][0]['lr']

if best_acc < epoch_test_acc:
best_acc = epoch_test_acc
best_model = copy.deepcopy(model)

print(f"Epoch: {<!-- -->epoch + 1}, Lr:{<!-- -->lr}, TrainAcc: {<!-- -->epoch_train_acc*100:.1f} , TrainLoss: {<!-- -->epoch_train_loss:.3f}, TestAcc: {<!-- -->epoch_test_acc*100:.1f}, TestLoss: {<!-- -->epoch_test_loss:.3f} ")

print(f"Saving Best Model with Accuracy: {<!-- -->best_acc*100:.1f} to {<!-- -->best_model_path}")
torch.save(best_model.state_dict(), best_model_path)
print('Done')

Training process log
It can be seen that the accuracy of the model on the test set reaches up to 100%.

Show the training process

epoch_ranges = range(epochs)
plt.figure(figsize=(20,6))
plt.subplot(121)
plt.plot(epoch_ranges, train_loss, label='train loss')
plt.plot(epoch_ranges, test_loss, label='validation loss')
plt.legend(loc='upper right')
plt.title('Loss')

plt.figure(figsize=(20,6))
plt.subplot(122)
plt.plot(epoch_ranges, train_acc, label='train accuracy')
plt.plot(epoch_ranges, test_acc, label='validation accuracy')
plt.legend(loc='lower right')
plt.title('Accuracy')

Training process parameters

Model effect display

model.load_state_dict(torch.load(best_model_path))
model.to(device)
model.eval()

plt.figure(figsize=(20,4))
for i in range(20):
plt.subplot(2, 10, i + 1)
plt.axis('off')
image = random.choice(coffee_images)
input = transform(Image.open(str(image))).to(device).unsqueeze(0)
pred = model(input)
plt.title(f'T:{<!-- -->image.parts[-2]}, P:{<!-- -->class_names[pred.argmax()]}')
plt.imshow(Image.open(str(image)))

Model effect display
It can be seen from the results that indeed all coffee beans have been correctly identified.

Summary and experience

  • Because the current network is still quickly converging to a very high level, there should be a lot of room for simplification, but some accuracy may be slightly sacrificed.
  • The selection of the model should be determined based on the actual task. For tasks such as coffee bean type identification, using VGG-16 is too wasteful.
  • During the streamlining process, no obvious change in training speed was felt, indicating that there is no direct correlation between parameter quantity and training speed.
  • Consecutive multi-layer convolution operations with the same parameters seem to have better results than using only one layer.