4 Tensorflow image recognition model (Part 1) data preprocessing

Previous article: 3 Detailed explanation of tensorflow model construction-CSDN blog

This article begins by introducing the model for identifying cat and dog pictures. It has a lot of content and will be divided into multiple chapters. Model building is still the same process as before:

  • Dataset preparation
  • Data preprocessing
  • Create model
  • Set up loss function and optimizer
  • Training model

This article first introduces data set preparation & preprocessing.

1. Understand supervised learning

Before getting started, you need to understand what supervised learning is. Machine learning is based on the classification of learning methods and can be divided into:

  • supervised learning
  • unsupervised learning
  • reinforcement learning

Baidu Encyclopedia defines supervised learning as using labeled data sets to train algorithms in order to classify data or accurately predict outcomes.

The image recognition model we want to build belongs to the supervised learning method. The input of the model is a “feature-label” pair, the feature is the input image, and the label is the expected result of labeling the image (such as whether the image is a cat or a dog).

2. Introduction to training data sets

There are many public data sets on the Internet that can be used for learning, and beginners do not need to spend a lot of time on data preparation. The following is the download address of the cat and dog data sets:

https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip

After downloading, you can place the unzipped folder in the root directory of your project for easy reading later.

(1) Directory structure of the data set

There are two subdirectories under the data set, one is the training set (train), and the other is the verification set (validation). There are folders for cats and dogs under the training set and the validation set, which contain collected photos of cats and dogs.

Divide the training and validation subsets into two subsets, which are mainly used to train and evaluate the model. The validation data can be used to see how the model performs on images that have not been trained.

(2) View the number of data sets

I placed the data set in the root directory of the current pycharm project, and readers can replace the path according to their actual location.

import os

# Get the training set and validation set directories
train_dir = os.path.join('cats_and_dogs_filtered/train')
validation_dir = os.path.join('cats_and_dogs_filtered/validation')

# Get the directory of cats and dogs in the training set
train_cats_dir = os.path.join(train_dir, 'cats')
train_dogs_dir = os.path.join(train_dir, 'dogs')

# Get the directory of cats and dogs in the verification set
validation_cats_dir = os.path.join(validation_dir, 'cats')
validation_dogs_dir = os.path.join(validation_dir, 'dogs')

# Check the number of pictures of cats and dogs in the training set
print('Number of cat pictures in training set:')
print(len(os.listdir(train_cats_dir)))
print('Number of dog pictures in training set:')
print(len(os.listdir(train_dogs_dir)))

# Check the number of pictures of cats and dogs in the verification set
print('Number of cat pictures in the verification set:')
print(len(os.listdir(validation_cats_dir)))
print('Number of dog pictures in the verification set:')
print(len(os.listdir(validation_dogs_dir)))

operation result:

Number of cat pictures in the training set:
1000
Number of dog pictures in the training set:
1000
Number of cat pictures in the verification set:
500
Number of dog pictures in the verification set:
500

From the running results, we can know that there are 2000 images in the training set and 1000 images in the verification set.

(3) Understanding RGB images

Understanding RGB images helps us understand why the parameters of the model are set as they are.

According to Baidu Encyclopedia’s definition of RGB, RGB is a color standard in the industry, which is obtained by changing the three color channels of red (R), green (G), and blue (B) and their superposition. Of all kinds of colors, RGB represents the colors of the three channels of red, green, and blue. This standard includes almost all colors that human vision can perceive, and is one of the most widely used color systems.

You can use the following sketch to assist understanding. For example, a 3

Most color images can be divided into three RGB color channels. We can select one of the images in the training set to take a look:

import os
import cv2

# Get the training set and validation set directories
train_dir = os.path.join('cats_and_dogs_filtered/train')
validation_dir = os.path.join('cats_and_dogs_filtered/validation')

# Get the directory of cats and dogs in the training set
train_cats_dir = os.path.join(train_dir, 'cats')
train_dogs_dir = os.path.join(train_dir, 'dogs')

# Get the directory of cats and dogs in the verification set
validation_cats_dir = os.path.join(validation_dir, 'cats')
validation_dogs_dir = os.path.join(validation_dir, 'dogs')

# Get all file names of training set-cat files
train_cats_name = os.listdir(train_cats_dir)

# Get the path of one of the pictures
picture_1 = os.path.join(train_cats_dir, train_cats_name[0])

#Print the image name
print('Picture name:' + train_cats_name[0])

#Read the information value of the image
picture_1 = cv2.imread(picture_1)

#Print the shape of the image
print(picture_1.shape)

operation result:

Image name: cat.952.jpg
(375, 499, 3)

You can see that the size of the image cat.952.jpg is 375

To select another one, just change the following two lines of code:

# Get the path of one of the pictures
picture_1 = os.path.join(train_cats_dir, train_cats_name[1])

#Print the image name
print('Picture name:' + train_cats_name[1])

operation result:

Image name: cat.946.jpg
(374, 500, 3)

You can see that the sizes of the two pictures are different. The neural network input requires the same size, so these data sets cannot be used directly for training, and data preprocessing is required.

(4) Data preprocessing

There are mainly two parts:

Unify image size and scale proportionally

You can first look at the effect of a single image. The image size is adjusted to 150*150 and scaled according to 1/255:

import os
import cv2

# Get the training set and validation set directories
train_dir = os.path.join('cats_and_dogs_filtered/train')
validation_dir = os.path.join('cats_and_dogs_filtered/validation')

# Get the directory of cats and dogs in the training set
train_cats_dir = os.path.join(train_dir, 'cats')
train_dogs_dir = os.path.join(train_dir, 'dogs')

# Get the directory of cats and dogs in the verification set
validation_cats_dir = os.path.join(validation_dir, 'cats')
validation_dogs_dir = os.path.join(validation_dir, 'dogs')

# Get all file names of training set-cat files
train_cats_name = os.listdir(train_cats_dir)

# Get the path of one of the pictures
picture_1 = os.path.join(train_cats_dir, train_cats_name[0])

#Print the image name
print('Picture name:' + train_cats_name[0])

#Read the information value of the image
picture_1 = cv2.imread(picture_1)

#Print the information value of the image
print(picture_1)

picture_2 = cv2.resize(picture_1, (150, 150))
print('The adjusted picture shape is:')
print(picture_2.shape)

# Divide by 255 to scale the image
picture_2 = picture_1 / 255
print('Scaleed image information value matrix:')
print(picture_2)

operation result:

Image name: cat.946.jpg
[[[158 157 143]
[128 126 115]
[103 97 92]

[71 70 66]
[71 70 66]
[71 70 66]]

[[158 157 143]
[128 126 115]
[103 97 92]

[74 73 69]
[74 73 69]
[74 73 69]]

[[157 156 142]
[128 126 115]
[103 97 92]

[77 76 72]
[77 76 72]
[77 76 72]]

[[128 123 125]
[126 121 123]
[124 119 121]

[40 61 83]
[38 59 81]
[37 58 80]]

[[135 132 134]
[132 129 131]
[130 127 129]

[39 60 82]
[38 59 81]
[37 58 80]]

[[140 137 139]
[138 135 137]
[135 132 134]

[39 60 82]
[38 59 81]
[37 58 80]]]
The adjusted image shape is:
(150, 150, 3)
Scaled image information value matrix:
[[[0.61960784 0.61568627 0.56078431]
[0.50196078 0.49411765 0.45098039]
[0.40392157 0.38039216 0.36078431]

[0.27843137 0.2745098 0.25882353]
[0.27843137 0.2745098 0.25882353]
[0.27843137 0.2745098 0.25882353]]

[[0.61960784 0.61568627 0.56078431]
[0.50196078 0.49411765 0.45098039]
[0.40392157 0.38039216 0.36078431]

[0.29019608 0.28627451 0.27058824]
[0.29019608 0.28627451 0.27058824]
[0.29019608 0.28627451 0.27058824]]

[[0.61568627 0.61176471 0.55686275]
[0.50196078 0.49411765 0.45098039]
[0.40392157 0.38039216 0.36078431]

[0.30196078 0.29803922 0.28235294]
[0.30196078 0.29803922 0.28235294]
[0.30196078 0.29803922 0.28235294]]

[[0.50196078 0.48235294 0.49019608]
[0.49411765 0.4745098 0.48235294]
[0.48627451 0.46666667 0.4745098 ]

[0.15686275 0.23921569 0.3254902 ]
[0.14901961 0.23137255 0.31764706]
[0.14509804 0.22745098 0.31372549]]

[[0.52941176 0.51764706 0.5254902 ]
[0.51764706 0.50588235 0.51372549]
[0.50980392 0.49803922 0.50588235]

[0.15294118 0.23529412 0.32156863]
[0.14901961 0.23137255 0.31764706]
[0.14509804 0.22745098 0.31372549]]

[[0.54901961 0.5372549 0.54509804]
[0.54117647 0.52941176 0.5372549 ]
[0.52941176 0.51764706 0.5254902 ]

[0.15294118 0.23529412 0.32156863]
[0.14901961 0.23137255 0.31764706]
[0.14509804 0.22745098 0.31372549]]]

The following is the code for data set preprocessing:

# Model parameter settings
BATCH_SIZE = 100

# The image size is unified to 150*150
IMG_SHAPE = 150

# Image Processing
img_generator = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1. / 255, horizontal_flip=True, )

#Training data
train_data_gen = img_generator.flow_from_directory(directory=train_dir,
                                                   shuffle=True,
                                                   batch_size=BATCH_SIZE,
                                                   target_size=(IMG_SHAPE, IMG_SHAPE),
                                                   class_mode='binary')

# verify the data
val_data_gen = img_generator.flow_from_directory(directory=validation_dir,
                                                 shuffle=True,
                                                 batch_size=BATCH_SIZE,
                                                 target_size=(IMG_SHAPE, IMG_SHAPE),
                                                 class_mode='binary')
  • target_size=(IMG_SHAPE, IMG_SHAPE), IMG_SHAPE is set to 150, that is, the size is uniformly adjusted to 150*150 when reading data
  • tf.keras.preprocessing.image.ImageDataGenerator(rescale=1. / 255) is scaled by 1/255. Because the numerical matrix type of the image information is unit8, in the range of 0~255, the pixel values are normalized to between 0 and 1 after scaling.

Now that the data is ready, we will introduce building an image recognition model later.

Next article: 5 Tensorflow Image Recognition (Part 2) Model Construction-CSDN Blog

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge. Python entry skill treeArtificial intelligenceDeep learning 386,234 people are learning the system