[Machine Learning] Model Evaluation-Handwritten Digit Set Model Training and Evaluation

Article directory

Preface
1. Loading of data sets
2. Divide into training set and test set, shuffle operation, two classifications
3. Training model and prediction
4. Model Evaluation
- 1. Cross-validation
- 2.Confusion Matrix-confusion matrix
- - 2.1 Precision,recall,f1_sorce
  - 2.2 The impact of ROC curve and threshold on the results
5. Summary

Foreword

Earlier we introduced common methods and indicators for model evaluation. Now we will train a set of handwritten digits, evaluate our model through different methods, and further increase our understanding of model evaluation methods and indicators.

1. Loading of data sets

import matplotlib
import pandas as pd
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
#%%
from sklearn.datasets import load_digits
data=load_digits()
print(data.data.shape)

There are a total of 1797 (8×8) pictures of written numbers. Let’s show them.

plt.gray()
plt.figure(figsize=(8,6))
plt.imshow(data.images[0])
plt.show()

This image has been pixel processed, but we can still clearly see that it is 0

2. Divide training set and test set, shuffle operation, two classifications

Here we set the test set to test set ratio as 7:3

from sklearn.model_selection import train_test_split
X,y=data.data,data.target
X_train,y_train,X_test,y_test=train_test_split(X,y,test_size=0.3,random_state=42)

Since the data set has a certain regularity from 0 to 9, and there is a correlation between the two, we shuffle the training data.

indexs=np.random.permutation(len(X_train))#permutaion generates a random array from 0 to n based on given n length
X_train,y_train=X_train[index],y_train[index]

Since the data set has 10 types from 0 to 9, we integrate the data into a binary set, divided into non-2 numbers and 2 numbers.

y_train_2=(y_train==2)
y_test_2=(y_test==2)

3. Training model and prediction

We use the gradient descent classifier SGDClassifier of sklearn linear model

from sklearn.linear_model import SGDClassifier
sgd_clf=SGDClassifier(max_iter=10,random_state=42)
sgd_clf.fit(X_train,y_train_2)
y_prediction=sgd_clf.predict(X[2].reshape(1,-1))
print(y_prediction)

The prediction returns True, let’s print the number at that position and take a look

print(y_prediction)
print(y[2])

You can see that the prediction was successful

4. Model evaluation

1. Cross-validation

Cross-validation is to divide the training set into multiple parts, then take one part as the test set, and use the other parts as training sets to verify each other until each divided part is evaluated as a validation set. This method can compare the data sets. Less is more friendly. The cross_val_score function can help us better evaluate the performance of the model on the data set and select the optimal model. At the same time, by using cross-validation, we can better utilize the limited data set and reduce the overfitting of the model.

from sklearn.model_selection import cross_val_score
cross_val_scores=cross_val_score(sgd_clf,X_train,y_train_2,cv=3,scoring='accuracy')#Cut into 3 parts
print('cross-validation scores ',cross_val_scores)
print('Cross-validation mean score',cross_val_scores.mean())
print('Cross-validation variance',cross_val_scores.std())

You can see that the average score is still quite high, indicating that our model is better.

In order to better analyze the model performance, we can use cross_val_predict, which is a function in Scikit-learn that is used to perform cross-validation and return the prediction results of the model.
Specifically, the cross_val_predict function can receive a machine learning model, training data, target variables, and cross-validation parameters, and then return an array containing the prediction results of each cross-validation. These prediction results can be used to evaluate the performance of the model on the data set and perform subsequent analysis and processing.
Unlike the cross_val_score function, the cross_val_predict function returns the prediction result of each sample rather than the score of each cross-validation. This means we can get more detailed model performance information and conduct more in-depth analysis and tuning.

from sklearn.model_selection import cross_val_predict
cross_val_predicts=cross_val_predict(sgd_clf,X_train,y_train_2,cv=3)
print('cross_val_predicts: ',cross_val_predicts)

2.Confusion Matrix-Confusion Matrix

We mentioned the following indicators in a previous article, so we won’t explain them too much.

Above we said TP, FP, TN, FN Now let’s get it

from sklearn.metrics import confusion_matrix

confusion_matrixs=confusion_matrix(y_train_2,cross_val_predicts)
print(confusion_matrix)

Here it is simply expressed as

[[TN FN]
[FP TP]]
Explanation: We successfully identified 126 numbers that are 2, and mistakenly identified 4 non-2 characters as 2.

2.1 Precision,recall,f1_sorce

from sklearn.metrics import precision_score,recall_score,f1_score
precision_score=precision_score(y_train_2,cross_val_predicts)
recall_score=recall_score(y_train_2,cross_val_predicts)
f1_score=f1_score(y_train_2,cross_val_predicts)
print('Precision:',precision_score)
print('recall',recall_score)
print('harmonic mean:',f1_score)

2.2 The impact of ROC curve and threshold on the results

We set [5,2,4,3,2,2,2,2]
is the result of model prediction, and from left to right is the decision score [12, 22, 33, 42, 54, 63, 74, 80]. The higher the decision score, the higher the accuracy. At this time, we set the threshold score to 40. Then it is divided into two parts according to the score size. The left part is the FN where our prediction failed, and the right part is the TN where the prediction was successful. The respective calculation formulas will affect our Precision. , recall score, so choosing an appropriate threshold can effectively improve our model evaluation.

Above we know that X[2] is 2 and is True. We check its decision score value. Scikit-Learn does not allow setting the threshold directly, but it can get the decision score and call its decision_function() method.

Scikit-Learn does not allow setting thresholds directly, but it can get decision scores,
Call its decision_function() method

y_sorce=sgd_clf.decision_function(X[2].reshape(1,-1))
print(y_source)

We can get all the decision scores at the same time

y_sorce=cross_val_predict(sgd_clf,X_train,y_train_2,cv=3,method='decision_function')
print(y_sorce[0:15])

Get all its thresholds
precision_recall_curve is a function in Scikit-learn, used to calculate the values and thresholds of precision and recall of classification models.
Specifically, the precision_recall_curve function can receive the predicted probability and true label of a binary classification model, and then calculate the precision and recall rates under a series of thresholds, as well as the corresponding threshold values. These precision, recall, and threshold values can be used to draw precision-recall curves or calculate performance metrics such as the average precision of the model.

from sklearn.metrics import precision_recall_curve
predictions,recalls,thresholds=precision_recall_curve(y_train_2,y_sorce)

We draw a line chart and observe the situation

sns.set_theme(style="darkgrid")
data_line=pd.DataFrame({<!-- -->"predictions":predictions[:len(thresholds)],"recalls":recalls[:len(thresholds)],'thresholds':thresholds})
sns.lineplot(x='thresholds',y='predictions',data=data_line)
sns.lineplot(x='thresholds',y='recalls',data=data_line)
# plt.savefig(f'D:\Blog Documentation\Model Evaluation\{random.randint(1,100)}.png')
plt.show()

Near 0, the precision and recall rates are the highest, and are symmetrical close to 0, and we found that changes in the threshold have a great impact on the results, and we need to be careful when setting the threshold.

AUC value (Area Under the ROC Curve)

To get the optimal threshold, we can also get the AUC value and draw the ROC curve.
The roc_curve function used is a function in the scikit-learn library that is used to draw ROC curves (Receiver Operating Characteristic Curve) and is used to evaluate the performance of binary classification models. The ROC curve shows the relationship between the true positive rate (True Positive Rate) tpr and the false positive rate (False Positive Rate) fpr, which can help us choose the best classification model. optimal threshold.

We obtain tpr, fpr, threshold and draw ROC observations

from sklearn.metrics import roc_curve
fpr,tpr,thresholds=roc_curve(y_train_2,y_sorce)
line_data=pd.DataFrame({<!-- -->'fpr':fpr,'tpr':tpr,'thresholds':thresholds})
sns.lineplot(data=line_data,x='fpr',y='tpr',)
sns.lineplot(data=pd.DataFrame({<!-- -->'x':[0,1],'y':[0,1]}))
plt.show()

The dotted line represents the ROC curve of a purely random classifier; a good classifier is as far away from this line as possible (toward the upper left corner) and it is obvious that our model classifier is quite good.

We can also use roc_auc_score to calculate our AUC value

from sklearn.metrics import roc_auc_score
roc_auc_score=roc_auc_score(y_train_2,y_sorce)
print("AUC value:",roc_auc_score)

Score yyds, I really hope the AUC values of all my future models can be this high.

Finding the optimal threshold usually requires a combination of actual application scenarios and performance indicators of the classification model.

A commonly used method is to determine the optimal threshold based on the coordinate points on the ROC curve. On the ROC curve, we can find the optimal threshold by calculating the FPR and TPR coordinates corresponding to each threshold. Usually, we can choose the optimal threshold based on the following indicators:

Maximize TPR: When we pay more attention to the recall rate of the model (i.e., true positive rate), we can choose the threshold that maximizes TPR as the optimal threshold.
Minimizing FPR: When we pay more attention to the accuracy of the model (i.e., the false positive rate), we can choose the threshold that minimizes FPR as the optimal threshold.
Maximize AUC: When we want to comprehensively consider the precision and recall of the model, we can choose the threshold that maximizes AUC (Area Under the Curve) as the optimal threshold.
Example If we pay more attention to the recall rate of the model, we can continue to use the above fpr, tpr, thresholds values.

from sklearn.metrics import roc_curve
fpr,tpr,thresholds=roc_curve(y_train_2,y_sorce)
# Find the threshold that maximizes TPR
good_threshold = thresholds[np.argmax(tpr)]
print('optimal threshold',good_threshold)

Add it to the model and calculate the recall rate

recall_scores=recall_score(y_train_2,cross_val_predicts)
print('The optimal threshold recall_sorce:' is not set, recall_scores)
y_pred=(cross_val_predicts>good_threshold)
recall_score=recall_score(y_train_2,y_pred)
print('After setting the optimal threshold, recall_sorce:', recall_scores)

The results are rather embarrassing. You can see from the picture above that our classification model is still quite good, so it cannot be optimized. Maybe, maybe.

Then I tried it again and it still had some effect.

Or we use the oldest method and test every threshold

from sklearn.metrics import recall_score

for threshold in thresholds:
    b_recall_scoress = recall_score(y_train_2, cross_val_predicts)
    y_pred = (cross_val_predicts > threshold)

    e_recall_score = recall_score(y_train_2, y_pred)
    max_good_threshold.append(threshold)
    beging_recalls.append(b_recall_scoress)
    end_recallss.append(e_recall_score)

end_good_recalls_index=np.argmax(end_recallss)
b_recalss=beging_recalls[end_good_recalls_index]
e_threshold=max_good_threshold[end_good_recalls_index]
e_reclass=end_recasses[end_good_recalls_index]

print('optimal threshold',e_threshold)
print('before optimization',b_reclasss)
print('After optimization',e_reclass)

After threshold optimization, recall is still significantly improved.

5. Summary

In this section, we explain in detail several methods of model evaluation and obtain optimal thresholds to help us better evaluate and optimize the model.
I hope you can support me, I will work harder to learn and share more interesting things