When compiling the model in Tensorflow2, the input data format requirements change under different selections of loss functions.

1. Introduction to commonly used loss functions in tf2

In TensorFlow 2, when compiling a model, you can choose different loss functions to define the model’s objective function. Different loss functions are suitable for different problem types and model architectures. The following are several common loss functions and their functions and applicable scenarios:

1. Mean Squared Error (MSE): MSE is a loss function commonly used in regression problems and is used to measure the average squared difference between the predicted value and the true value. Larger errors result in larger penalties, suitable for regression tasks.

model.compile(loss='mse', ...)

2. Binary Cross Entropy: Binary cross entropy is a commonly used loss function in binary classification problems and is used to measure the difference between two categories. Suitable for binary classification problems, the output is a sigmoid activation model of a probability value.

model.compile(loss='binary_crossentropy', ...)

3. Multi-class cross entropy (Categorical Cross Entropy): Multi-class cross entropy is a commonly used loss function in multi-classification problems and is used to measure the differences between multiple categories. Suitable for multi-classification problems, the output is a softmax activation model of the probability distribution of each class.

model.compile(loss='categorical_crossentropy', ...)

4. Sparse Categorical Cross Entropy (Sparse Categorical Cross Entropy): Similar to multi-class cross entropy, but suitable for multi-classification problems where labels are represented in integer form instead of one-hot encoding.

model.compile(loss='sparse_categorical_crossentropy', ...)

5.KL divergence loss (Kullback-Leibler Divergence): KL divergence is used to measure the difference between two probability distributions. In generative models, it is often used in combination with models such as autoencoders to promote the model output to be close to a predefined probability distribution.

model.compile(loss='kullback_leibler_divergence', ...)

In addition to the common loss functions mentioned above, there are other customized loss functions that can be customized according to specific tasks and needs. Through the tf.keras.losses module, you can view more available loss functions and choose the one that suits your model. When choosing a loss function, you need to make a reasonable choice based on the task type, data distribution, and model design to obtain the best training effect.

2. Comparative analysis of two loss functions

Categorical Cross Entropy and Sparse Categorical Cross Entropy

Similar points: both can be used for data multi-classification tasks.

Difference: The input requirements for data are different. Categorical Cross Entropy requires data to beone-hot encoded. This is mainly for label data of data, such as our data label data. When reading, its category is 0-9. This data can be a column of data. At this time, we can use theSparse Categorical Cross Entropy (Sparse Categorical Cross Entropy) function to compile directly.

one_hot encoding (one-hot encoding) description:

An encoding that represents each element as a binary vector, in which only one element is 1 and the remaining elements are 0. For example, if we have a list of length N, then its one-hot encoding will be an NxN matrix, where the i-th row represents the encoding of the i-th element. For example, if we have a list of 3 colors [“red”, “blue”, “green”], then their one-hot encoding would be:

Red: [1,0,0] Blue: [0,1,0] Green: [0,0,1]

This encoding method is commonly used in machine learning to convert each category label into a one-hot vector for training.

If we use Categorical Cross Entropy as the loss function, then we perform one-hot encoding on the data, and use it where the code is:

y_train=tf.keras.utils.to_categorical(y_train) #Error report
    y_test=tf.keras.utils.to_categorical(y_test)

Error reported in tensorflow2.5 environment:

tensorflow.python.framework.errors_impl.InvalidArgumentError: logits and labels must have the same first dimension, got logits shape [12,16] and labels shape [204]
[[node sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits (defined at /PycharmProjects/pythonProject/ML_New/MLP_Classifier_tf/MLP_Classifier_tf_imgVali.py:286) ]] [Op:__inference_train_function_762]

Function call stack:
train_function

Here we can use the following code instead:

y_train_one_hot = tf.one_hot(y_train, depth=num_classes)
y_test_one_hot = tf.one_hot(y_test, depth=num_classes)

3. Sample code analysis

The loss function corresponding to Sparse Categorical Cross Entropy and Categorical Cross Entropy is:

loss='sparse_categorical_crossentropy' loss='categorical_crossentropy'

Use minist data to do a simple MLP model classification. Here we first use the Sparse Categorical Cross Entropy loss function. code show as below:

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Prepare data set
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape(-1, 784) / 255.0
x_test = x_test.reshape(-1, 784) / 255.0

# Build model
model = Sequential()
model.add(Dense(64, activation='relu', input_dim=784))
model.add(Dense(64, activation='relu'))
model.add(Dense(10, activation='softmax'))

# Compile model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

#Train model
model.fit(x_train, y_train, epochs=10, batch_size=32, validation_data=(x_test, y_test))

# Evaluate the model
loss, accuracy = model.evaluate(x_test, y_test)
print('Test Loss:', loss)
print('Test Accuracy:', accuracy)

# Use the model to make predictions
predictions = model.predict(x_test[:5])
print('Predictions:', tf.argmax(predictions, axis=1))
print('Labels:', y_test[:5])

The running results are as follows:

D:\PycharmProjects\pythonProject\venv\Scripts\python.exe D:/PycharmProjects/pythonProject/ML_New/MLP_Classifier_tf/MLP_TEST_MINIST.py
2023-10-14 22:28:27.465600: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
2023-10-14 22:28:30.610122: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll
2023-10-14 22:28:30.637119: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2070 computeCapability: 7.5
coreClock: 1.62GHz coreCount: 36 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2023-10-14 22:28:30.637445: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
2023-10-14 22:28:30.648571: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
2023-10-14 22:28:30.648748: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll
2023-10-14 22:28:30.652682: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cufft64_10.dll
2023-10-14 22:28:30.654729: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library curand64_10.dll
2023-10-14 22:28:30.657643: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusolver64_11.dll
2023-10-14 22:28:30.661178: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusparse64_11.dll
2023-10-14 22:28:30.662311: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll
2023-10-14 22:28:30.662510: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2023-10-14 22:28:30.662864: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-10-14 22:28:30.663583: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2070 computeCapability: 7.5
coreClock: 1.62GHz coreCount: 36 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2023-10-14 22:28:30.663941: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2023-10-14 22:28:31.130464: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2023-10-14 22:28:31.130645: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
2023-10-14 22:28:31.130748: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
2023-10-14 22:28:31.130967: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6001 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
2023-10-14 22:28:31.709522: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
Epoch 1/10
2023-10-14 22:28:31.920032: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
2023-10-14 22:28:32.369951: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll
1875/1875 [==============================] - 5s 2ms/step - loss: 0.2845 - accuracy: 0.9174 - val_loss : 0.1443 - val_accuracy: 0.9547
Epoch 2/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.1261 - accuracy: 0.9633 - val_loss : 0.1085 - val_accuracy: 0.9646
Epoch 3/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0937 - accuracy: 0.9716 - val_loss : 0.1034 - val_accuracy: 0.9690
Epoch 4/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0731 - accuracy: 0.9772 - val_loss : 0.0987 - val_accuracy: 0.9714
Epoch 5/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0612 - accuracy: 0.9810 - val_loss : 0.0828 - val_accuracy: 0.9749
Epoch 6/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0507 - accuracy: 0.9835 - val_loss : 0.0955 - val_accuracy: 0.9702
Epoch 7/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0430 - accuracy: 0.9859 - val_loss : 0.0863 - val_accuracy: 0.9746
Epoch 8/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0374 - accuracy: 0.9874 - val_loss : 0.0935 - val_accuracy: 0.9737
Epoch 9/10
1875/1875 [==============================] - 6s 3ms/step - loss: 0.0328 - accuracy: 0.9894 - val_loss : 0.0902 - val_accuracy: 0.9754
Epoch 10/10
1875/1875 [==============================] - 6s 3ms/step - loss: 0.0287 - accuracy: 0.9900 - val_loss : 0.0902 - val_accuracy: 0.9771
313/313 [==============================] - 1s 2ms/step - loss: 0.0902 - accuracy: 0.9771
Test Loss: 0.09022707492113113
Test Accuracy: 0.9771000146865845
Predictions: tf.Tensor([7 2 1 0 4], shape=(5,), dtype=int64)
Labels: [7 2 1 0 4]

Process finished with exit code 0

We modify the loss function to Categorical Cross Entropy and run the code and an error will be reported:

 ValueError: Shapes (32, 1) and (32, 10) are incompatible

This is because we did not convert the label data into one-hot encoding. We converted it and added: before the model.fit() function:

y_train = tf.one_hot(y_train, depth=10)
y_test= tf.one_hot(y_test, depth=10)

The running results are as follows:

D:\PycharmProjects\pythonProject\venv\Scripts\python.exe D:/PycharmProjects/pythonProject/ML_New/MLP_Classifier_tf/MLP_TEST_MINIST.py
2023-10-14 23:20:04.708405: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
2023-10-14 23:20:07.803493: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll
2023-10-14 23:20:07.833164: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2070 computeCapability: 7.5
coreClock: 1.62GHz coreCount: 36 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2023-10-14 23:20:07.833480: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
2023-10-14 23:20:07.840527: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
2023-10-14 23:20:07.840689: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll
2023-10-14 23:20:07.844132: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cufft64_10.dll
2023-10-14 23:20:07.845657: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library curand64_10.dll
2023-10-14 23:20:07.848488: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusolver64_11.dll
2023-10-14 23:20:07.852061: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusparse64_11.dll
2023-10-14 23:20:07.853130: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll
2023-10-14 23:20:07.853317: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2023-10-14 23:20:07.853652: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-10-14 23:20:07.854467: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2070 computeCapability: 7.5
coreClock: 1.62GHz coreCount: 36 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2023-10-14 23:20:07.854879: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2023-10-14 23:20:08.326771: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2023-10-14 23:20:08.326942: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
2023-10-14 23:20:08.327041: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
2023-10-14 23:20:08.327252: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6001 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
2023-10-14 23:20:08.914697: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
Epoch 1/10
2023-10-14 23:20:09.138669: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
   1/1875 [........................] - ETA: 21:57 - loss: 2.4066 - accuracy: 0.12502023- 10-14 23:20:09.626287: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll
1875/1875 [==============================] - 6s 3ms/step - loss: 0.2784 - accuracy: 0.9182 - val_loss : 0.1517 - val_accuracy: 0.9510
Epoch 2/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.1217 - accuracy: 0.9633 - val_loss : 0.1258 - val_accuracy: 0.9611
Epoch 3/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0874 - accuracy: 0.9731 - val_loss : 0.1045 - val_accuracy: 0.9666
Epoch 4/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0701 - accuracy: 0.9778 - val_loss : 0.0929 - val_accuracy: 0.9718
Epoch 5/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0569 - accuracy: 0.9821 - val_loss : 0.0853 - val_accuracy: 0.9751
Epoch 6/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0488 - accuracy: 0.9844 - val_loss : 0.0911 - val_accuracy: 0.9706
Epoch 7/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0402 - accuracy: 0.9868 - val_loss : 0.0847 - val_accuracy: 0.9748
Epoch 8/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0355 - accuracy: 0.9882 - val_loss : 0.0975 - val_accuracy: 0.9723
Epoch 9/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0307 - accuracy: 0.9895 - val_loss : 0.1027 - val_accuracy: 0.9743
Epoch 10/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.0281 - accuracy: 0.9907 - val_loss : 0.1004 - val_accuracy: 0.9734
313/313 [==============================] - 1s 2ms/step - loss: 0.1004 - accuracy: 0.9734
Test Loss: 0.10037881135940552
Test Accuracy: 0.9733999967575073
Predictions: tf.Tensor([7 2 1 0 4], shape=(5,), dtype=int64)
Labels: tf.Tensor(
[[0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]], shape=(5, 10), dtype=float32)

Process finished with exit code 0

Note:

Prediction labels for models compiled using the sparse cross-entropy loss function start from 0. If your data starts from 1, then you need to pay attention to the two should be consistent when doing verification analysis later.

When training for multi-classification problems using the sparse cross-entropy loss function, the labels are usually represented by integers, and the label values range from 0 to the number of categories minus 1. The output of the model should also be a probability distribution for each class.

For example, if there are 3 categories, the labels will be encoded as 0, 1, and 2, and the output of the model will be a probability distribution vector of length 3 representing the predicted probability for each category.

During prediction, the model will return the predicted probability for each category. By taking the index corresponding to the maximum probability, the predicted category can be obtained. This index range is from 0 to the number of categories minus 1. Unlike sparse cross-entropy, when using ordinary (non-sparse) cross-entropy loss function, labels are usually encoded using one-hot encoding, where each category is represented by a vector, only the position corresponding to the true label is 1, and the rest are 0 . In this case, the labels of the prediction results also start from 0.

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge. Python entry skill treeArtificial intelligenceDeep learning 380044 people are learning the system