TensorFlow image multi-label classification example

Next, we will explain an example of image multi-label classification based on TensorFlow from scratch. Here, we will take the image verification code as an example.

When we visit a website, we often encounter image verification codes. The main purpose of image CAPTCHA is to distinguish between bots and humans and to keep bots out.

The following program simulates human recognition of the verification code, so that the website cannot distinguish whether it is a crawler program or a human logging in to the website.

10.4.1 Use TFRecord to generate training data

Take the picture verification code shown in Figure 10.5 as an example. Label this verification code picture as label=[3,8,8,7]. We know that classification networks can generally only identify one target at a time, so how to identify this multi-label sequence data?

The following TFRecord structure can be used to construct a multi-label training data set to achieve multi-label data recognition.

Figure 10.5 Image verification code

The following is the code to construct a TFRecord multi-label training data set:

import tensorflow as tf
# Define processing of integer features
def _int64_feature(value):
    return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
# Define processing of byte characteristics
def _bytes_feature(value):
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
# Define the processing of floating point characteristics
def _floats_feature(value):
    return tf.train_Feature(float_list=tf.train.floatList(value=[value]))
# Convert data
def convert_to_record(name, image, label, map):
    filename = os.path.join(params.TRAINING_RECORDS_DATA_DIR,
        name + '.' + params.DATA_EXT)
    writer = tf.python_io.TFRecordWriter(filename)
    image_raw = image.tostring()
    map_raw = map.tostring()
    label_raw = label.tostring()
    example = tf.train.Example(feature=tf.train.Feature(feature={
        'image_raw': _bytes_feature(image_raw),
        'map_raw': _bytes_feature(map_raw),
        '1abel_raw': _bytes_feature(label_raw)
    }))
    writer.write(example.SerializeToString())
    writer.close()

Through the above code, we constructed a TFRecord record that supports multiple labels. Multiple verification code images can build a multi-label data set of verification codes for subsequent multi-label classification training.

10.4.2 Constructing a multi-label classification network

Through the previous step, we obtained the verification code data set for multi-label classification, and now we need to build a multi-label classification network.

We choose the VGG network as the feature extraction network skeleton. Generally, the more complex the network, the more robust it is to noise. The noise in the verification code mainly comes from deformation, adhesion and manual addition. The VGG network has good robustness to these noises. The code is as follows:

import tensorflow as tf
tf.enable_eager_execution ()
def model_vgg(x, training = False):
# The first convolution of the first group uses 64 convolution kernels, and the kernel size is 3
conv1_1 = tf.layers.conv2d(inputs=x, filters=64,name="conv1_1",
    kernel_size=3, activation=tf.nn.relu, padding="same")
# The second convolution of the first group uses 64 convolution kernels, and the kernel size is 3
convl_2 = tf.layers.conv2d(inputs=conv1_1,filters=64, name="conv1_2",
    kernel_size=3, activation=tf.nn.relu,padding="same")
# The first pool operation core size is 2 and the step size is 2
pooll = tf.layers.max_pooling2d(inputs=conv1_2, pool_size=[2, 2],
    strides=2, name= 'pool1')
# The first convolution of the second group uses 128 convolution kernels with a kernel size of 3
conv2_1 = tf.layers.conv2d(inputs=pool1, filters=128, name="conv2_1",
    kernel_size=3, activation=tf.nn.relu, padding="same")
# The second convolution of the second group uses 64 convolution kernels with a kernel size of 3
conv2_2 = tf.layers.conv2d(inputs=conv2_1, filters=128,name="conv2_2",
    kernel_size=3, activation=tf.nn.relu, padding="same")
# The second pool operation core size is 2 and the step size is 2
pool2 = tf.layers.max_pooling2d(inputs=conv2_2, pool_size=[2, 2],
    strides=2, name="pool1")
# The first convolution of the third group uses 128 convolution kernels, and the kernel size is 3
conv3_1 = tf.layers.conv2d(inputs=pool2, filters=128, name="conv3_1",
    kernel_size=3, activation=tf.nn.relu, padding="same")
# The second convolution of the third group uses 128 convolution kernels with a kernel size of 3
conv3_2 = tf.layers.conv2d(inputs=conv3_1, filters=128, name="conv3_2",
    kernel_size=3, activation=tf.nn.relu, padding="same")
# The third group of third convolution uses 128 convolution kernels, and the kernel size is 3
conv3_3 = tf.layers.conv2d(inputs=conv3_2, filters=128, name="conv3_3",
    kernel_size=3, activation=tf.nn.relu, padding=" same")
# The third pool operation core size is 2 and the step size is 2
pool3 = tf.layers.max_pooling2d(inputs=conv3_3, pool_size=[2, 2],
    strides=2,name='pool3')
# The first convolution of the fourth group uses 256 convolution kernels, and the kernel size is 3
conv4_1 = tf.layers.conv2d(inputs-pool3, filters=256, name="conv4_1",
    kernel_size=3, activation=tf.nn.relu, padding="same")
# The second convolution of the fourth group uses 128 convolution kernels with a kernel size of 3
conv4_2 = tf.layers.conv2d(inputs=conv4_1, filters=128, name="conv4_2",
    kernel_size=3, activation=tf.nn.relu, padding="same")
# The third convolution of the fourth group uses 128 convolution kernels with a kernel size of 3
conv4_3 = tf.layers.conv2d(inputs=conv4_2, filters=128, name="cov4_3",
    kernel_size=3, activation=tf.nn.relu, padding="same" )
# The fourth pool operation core size is 2 and the step size is 2
pool4 = tf.layers.max.pooling2d(inputs=conv4_3, pool_size=[2,2],
    strides=2, name='pool4')
# The first convolution of the fifth group uses 512 convolution kernels, and the kernel size is 3
conv5_1 = tf.layers.conv2d(inputs=pool4, filters=512, name="conv5_1",
    kernel_size=3, activation=tf.nn.relu, padding=" same")
# The second convolution of the fifth group uses 512 convolution kernels with a kernel size of 3
conv5_2 = t.layers.conv2d(inputs=conv5_1, filters=512, name="conv5_2",
    kernel_size=3, activation=tf.nn.relu, padding="same")
# The third convolution of the fifth group uses 512 convolution kernels, and the kernel size is 3
conv5_3 = tf.layers.conv2d(inputs-conv5_2, filters=512, name="conv5_3",
    kernel_size=3, activation=tf.nn.relu, padding="same"
    )
# The fifth pool operation core size is 2 and the step size is 2
pool5 = tf.layers.max_pooling2d(inputs=conv5_3, pool_size=[2, 2],
    strides=2, name='pool5')
flatten = tf.layers.flatten(inputs=poo15, name="flatten")

The above is the single-label classification TensorFlow code of the VGG network, but what we need to implement here is multi-label classification, so we need to make corresponding improvements to the VGG network. The code is as follows:

# Build a fully connected layer with an output of 4096
fc6 = tf.layers.dense(inputs=flatten, units=4096,
activation=tf.nn.relu, name='fc6')
#In order to prevent overfitting, dropout operation is introduced
drop1 = tf.layers.dropout(inputs=fc6,rate=0.5, training=training)
# Build a fully connected layer with an output of 4096
fc7 = tf.layers.dense(inputs=drop1, units=4096,
activation=tf.nn.relu, name='fc7')
# In order to prevent over-reaction, dropout operation is introduced
drop2 = tf.layers.dropout(inputs=fc7, rate=0.5, training=training)
# Build a classifier for the first label
fc8_1 = tf.layers.dense(inputs=drop2, units=10,
activation=tf.nn.sigmoid, name='fc8_1')
# Build a classifier for the second label
fc8_2 = tf.layers.dense(inputs=drop2, units=10,
activation=tf.nn.sigmoid, name='fc8_2')
# Build a classifier for the third label
fc8_3 = tf.layers.dense(inputs=drop2, units=10,
activation=tf.nn.sigmoid, name='fc8_3')
# Build a classifier for the fourth label
fc8_4 = tf.layers.dense(inputs=drop2,units=10,
activation=tf.nn.sigmoid, name='fc8_4')
# Splice the results of the four tags
fc8 = tf.concat([fc8_1,fc8_2,fc8_3,fc8_4], 0)

The fc6 and fc7 fully connected layers here further process the convolutional features of the network. After the fc7 layer, we need to generate multi-label prediction results. Since there are 4 labels in a verification code image, 4 sub-classification networks need to be constructed. It is assumed here that the picture verification code only contains 10 numbers, so the predicted category output by each network is 10 categories, and finally four sub-networks with a predicted category of 10 are generated. If 64 verification code images are passed in for prediction each time, then after passing through 4 sub-networks, 4 images (64,10), (64,10), (64,10), and (64,10) will be generated respectively. quantity. If you use the Softmax classifier, you need to find a way to combine these four tensors, so you use the tf.concat function to perform tensor splicing operations.

The following is an example of parameter passing for the tf.concat function in TensorFlow:

tf.concat (
values,
axis,
name='concat'
)

Through the operation of fc8=tf.concat([fc8_1,fc8_2,fc8_3,fc8_4], 0), the first four (64.10) tensors can be transformed into a single tensor like (256.10). After generating a single tensor, The subsequent Softmax classification operation can be performed.

10.4.3 Multi-label training model

The first step in model training is to read the data. There are two reading methods: one is to directly read the image for operation, and the other is to convert it into a binary file format and then operate it. The former is simple to implement, but slow; the latter is complex to implement, but fast to read. Here we use the latter binary file format to introduce how to implement the reading operation of multi-label data. The following is the relevant code.

First read the TFRecord file content:

tfr = TFrecorder()
def input_fn_maker(path, data_info_path, shuffle=False, batch_size = 1,
epoch = 1, padding = None) :
def input_fn():
    filenames = tfr.get_filenames(path=path, shuffle=shuffle)
    dataset=tfr.get_dataset(paths=filenames,
        data_info=data_info_path, shuffle = shuffle,
        batch_size = batch_size, epoch = epoch, padding = padding)
    iterator = dataset.make_one_shot_iterator ()
    return iterator.get_next()
return input_fn
# Original image information
padding_info = ({'image':[30, 100,3,], 'label':[]})
# test set
test_input_fn = input_fn_maker('captcha_data/test/',
'captcha_tfrecord/data_info.csv',
batch_size = 512, padding = padding_info)
# Training set
train_input_fn = input_fn_maker('captcha_data/train/',
'captcha_tfrecord/data_info.csv',
shuffle=True, batch_size = 128, padding = padding_info)
# Validation set
train_eval_fn = input_fn_maker('captcha_data/train/',
'captcha_tfrecord/data_info.csv',
batch_size = 512, adding = padding_info)

Then comes the model training part:

def model_fn(features, net, mode):
features['image'] = tf.reshape(features['image'], [-1, 30, 100, 3])
# Get model prediction results based on net network
predictions = net(features['image'])
# Determine whether it is prediction mode or training mode
if mode == tf.estimator.ModeKeys.PREDICT:
    return tf.estimator.EstimatorSpec(mode=mode,
        predictions=predictions)
# Because it is a multi-label Softmax, the dimensions of the labels need to be processed in advance.
labels = tf.reshape(features['label'], features['label'].shape[0]*4,))
#Initialize softmaxloss
loss = tf.losses.sparse_softmax_cross_entropy(labels=labels,
    logits=logits)
# Obtain model results in training mode
if mode ==tf.estimator.ModeKeys.TRAIN:
    #Declare the optimizer type used by the model
    optimizer = tf.train.AdamOptimizer(learning_rate=1e-3)
        train_op = optimizer.minimize(
            loss=loss,global_step=tf.train.get_global_step())
    return tf.estimator.EstimatorSpec(mode=mode,
        loss=loss, train_op=train_op)
# Generate evaluation indicators
eval_metric_ops = {"accuracy": tf.metrics.accuracy(
    labels=features['label'],predictions=predictions["classes"]) }
return tf.estimator.EstimatorSpec(mode=mode, loss=loss,
    eval_metric_ops= eval_metric_ops)

The multi-label model training process is very similar to the ordinary single-label model training process. The only difference is that the multi-label label values need to be spliced into a tensor to meet the dimension requirements of the Softmax classification operation.

This article is excerpted from “Python Deep Learning Principles, Algorithms and Cases”.

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge. Python entry skill treeArtificial intelligenceDeep learning 383666 people are learning the system