Article directory
- Preface
- 1. Loading of data sets
- 2. Divide into training set and test set, shuffle operation, two classifications
- 3. Training model and prediction
- 4. Model Evaluation
-
- 1. Cross-validation
- 2.Confusion Matrix-confusion matrix
-
- 2.1 Precision,recall,f1_sorce
- 2.2 The impact of ROC curve and threshold on the results
- 5. Summary
Foreword
Earlier we introduced common methods and indicators for model evaluation. Now we will train a set of handwritten digits, evaluate our model through different methods, and further increase our understanding of model evaluation methods and indicators.
1. Loading of data sets
import matplotlib import pandas as pd import seaborn as sns import warnings warnings.filterwarnings('ignore') #%% from sklearn.datasets import load_digits data=load_digits() print(data.data.shape)
There are a total of 1797 (8×8) pictures of written numbers. Let’s show them.
plt.gray() plt.figure(figsize=(8,6)) plt.imshow(data.images[0]) plt.show()
This image has been pixel processed, but we can still clearly see that it is 0
2. Divide training set and test set, shuffle operation, two classifications
Here we set the test set to test set ratio as 7:3
from sklearn.model_selection import train_test_split X,y=data.data,data.target X_train,y_train,X_test,y_test=train_test_split(X,y,test_size=0.3,random_state=42)
Since the data set has a certain regularity from 0 to 9, and there is a correlation between the two, we shuffle the training data.
indexs=np.random.permutation(len(X_train))#permutaion generates a random array from 0 to n based on given n length X_train,y_train=X_train[index],y_train[index]
Since the data set has 10 types from 0 to 9, we integrate the data into a binary set, divided into non-2 numbers and 2 numbers.
y_train_2=(y_train==2) y_test_2=(y_test==2)
3. Training model and prediction
We use the gradient descent classifier SGDClassifier of sklearn linear model
from sklearn.linear_model import SGDClassifier sgd_clf=SGDClassifier(max_iter=10,random_state=42) sgd_clf.fit(X_train,y_train_2) y_prediction=sgd_clf.predict(X[2].reshape(1,-1)) print(y_prediction)
The prediction returns True, let’s print the number at that position and take a look
print(y_prediction) print(y[2])
You can see that the prediction was successful
4. Model evaluation
1. Cross-validation
Cross-validation is to divide the training set into multiple parts, then take one part as the test set, and use the other parts as training sets to verify each other until each divided part is evaluated as a validation set. This method can compare the data sets. Less is more friendly. The cross_val_score function can help us better evaluate the performance of the model on the data set and select the optimal model. At the same time, by using cross-validation, we can better utilize the limited data set and reduce the overfitting of the model.
from sklearn.model_selection import cross_val_score cross_val_scores=cross_val_score(sgd_clf,X_train,y_train_2,cv=3,scoring='accuracy')#Cut into 3 parts print('cross-validation scores ',cross_val_scores) print('Cross-validation mean score',cross_val_scores.mean()) print('Cross-validation variance',cross_val_scores.std())
You can see that the average score is still quite high, indicating that our model is better.
In order to better analyze the model performance, we can use cross_val_predict, which is a function in Scikit-learn that is used to perform cross-validation and return the prediction results of the model.
Specifically, the cross_val_predict function can receive a machine learning model, training data, target variables, and cross-validation parameters, and then return an array containing the prediction results of each cross-validation. These prediction results can be used to evaluate the performance of the model on the data set and perform subsequent analysis and processing.
Unlike the cross_val_score function, the cross_val_predict function returns the prediction result of each sample rather than the score of each cross-validation. This means we can get more detailed model performance information and conduct more in-depth analysis and tuning.
from sklearn.model_selection import cross_val_predict cross_val_predicts=cross_val_predict(sgd_clf,X_train,y_train_2,cv=3) print('cross_val_predicts: ',cross_val_predicts)
2.Confusion Matrix-Confusion Matrix
We mentioned the following indicators in a previous article, so we won’t explain them too much.
Above we said TP, FP, TN, FN Now let’s get it
from sklearn.metrics import confusion_matrix confusion_matrixs=confusion_matrix(y_train_2,cross_val_predicts) print(confusion_matrix)
Here it is simply expressed as
[[TN FN]
[FP TP]]
Explanation: We successfully identified 126 numbers that are 2, and mistakenly identified 4 non-2 characters as 2.
2.1 Precision,recall,f1_sorce
from sklearn.metrics import precision_score,recall_score,f1_score precision_score=precision_score(y_train_2,cross_val_predicts) recall_score=recall_score(y_train_2,cross_val_predicts) f1_score=f1_score(y_train_2,cross_val_predicts) print('Precision:',precision_score) print('recall',recall_score) print('harmonic mean:',f1_score)
2.2 The impact of ROC curve and threshold on the results
We set [5,2,4,3,2,2,2,2]
is the result of model prediction, and from left to right is the decision score [12, 22, 33, 42, 54, 63, 74, 80]. The higher the decision score, the higher the accuracy. At this time, we set the threshold score to 40. Then it is divided into two parts according to the score size. The left part is the FN where our prediction failed, and the right part is the TN where the prediction was successful. The respective calculation formulas will affect our Precision. , recall score, so choosing an appropriate threshold can effectively improve our model evaluation.
Above we know that X[2] is 2 and is True. We check its decision score value. Scikit-Learn does not allow setting the threshold directly, but it can get the decision score and call its decision_function() method.
Scikit-Learn does not allow setting thresholds directly, but it can get decision scores,
Call its decision_function() method
y_sorce=sgd_clf.decision_function(X[2].reshape(1,-1)) print(y_source)
We can get all the decision scores at the same time
y_sorce=cross_val_predict(sgd_clf,X_train,y_train_2,cv=3,method='decision_function') print(y_sorce[0:15])
Get all its thresholds
precision_recall_curve is a function in Scikit-learn, used to calculate the values and thresholds of precision and recall of classification models.
Specifically, the precision_recall_curve function can receive the predicted probability and true label of a binary classification model, and then calculate the precision and recall rates under a series of thresholds, as well as the corresponding threshold values. These precision, recall, and threshold values can be used to draw precision-recall curves or calculate performance metrics such as the average precision of the model.
from sklearn.metrics import precision_recall_curve predictions,recalls,thresholds=precision_recall_curve(y_train_2,y_sorce)
We draw a line chart and observe the situation
sns.set_theme(style="darkgrid") data_line=pd.DataFrame({<!-- -->"predictions":predictions[:len(thresholds)],"recalls":recalls[:len(thresholds)],'thresholds':thresholds}) sns.lineplot(x='thresholds',y='predictions',data=data_line) sns.lineplot(x='thresholds',y='recalls',data=data_line) # plt.savefig(f'D:\Blog Documentation\Model Evaluation\{random.randint(1,100)}.png') plt.show()
Near 0, the precision and recall rates are the highest, and are symmetrical close to 0, and we found that changes in the threshold have a great impact on the results, and we need to be careful when setting the threshold.
- AUC value (Area Under the ROC Curve)
To get the optimal threshold, we can also get the AUC value and draw the ROC curve.
The roc_curve function used is a function in the scikit-learn library that is used to draw ROC curves (Receiver Operating Characteristic Curve) and is used to evaluate the performance of binary classification models. The ROC curve shows the relationship between the true positive rate (True Positive Rate) tpr and the false positive rate (False Positive Rate) fpr, which can help us choose the best classification model. optimal threshold.
- We obtain tpr, fpr, threshold and draw ROC observations
from sklearn.metrics import roc_curve fpr,tpr,thresholds=roc_curve(y_train_2,y_sorce) line_data=pd.DataFrame({<!-- -->'fpr':fpr,'tpr':tpr,'thresholds':thresholds}) sns.lineplot(data=line_data,x='fpr',y='tpr',) sns.lineplot(data=pd.DataFrame({<!-- -->'x':[0,1],'y':[0,1]})) plt.show()
The dotted line represents the ROC curve of a purely random classifier; a good classifier is as far away from this line as possible (toward the upper left corner) and it is obvious that our model classifier is quite good.
We can also use roc_auc_score to calculate our AUC value
from sklearn.metrics import roc_auc_score roc_auc_score=roc_auc_score(y_train_2,y_sorce) print("AUC value:",roc_auc_score)
Score yyds, I really hope the AUC values of all my future models can be this high.
- Finding the optimal threshold usually requires a combination of actual application scenarios and performance indicators of the classification model.
A commonly used method is to determine the optimal threshold based on the coordinate points on the ROC curve. On the ROC curve, we can find the optimal threshold by calculating the FPR and TPR coordinates corresponding to each threshold. Usually, we can choose the optimal threshold based on the following indicators:
-
Maximize TPR: When we pay more attention to the recall rate of the model (i.e., true positive rate), we can choose the threshold that maximizes TPR as the optimal threshold.
-
Minimizing FPR: When we pay more attention to the accuracy of the model (i.e., the false positive rate), we can choose the threshold that minimizes FPR as the optimal threshold.
-
Maximize AUC: When we want to comprehensively consider the precision and recall of the model, we can choose the threshold that maximizes AUC (Area Under the Curve) as the optimal threshold.
-
Example If we pay more attention to the recall rate of the model, we can continue to use the above fpr, tpr, thresholds values.
from sklearn.metrics import roc_curve fpr,tpr,thresholds=roc_curve(y_train_2,y_sorce) # Find the threshold that maximizes TPR good_threshold = thresholds[np.argmax(tpr)] print('optimal threshold',good_threshold)
Add it to the model and calculate the recall rate
recall_scores=recall_score(y_train_2,cross_val_predicts) print('The optimal threshold recall_sorce:' is not set, recall_scores) y_pred=(cross_val_predicts>good_threshold) recall_score=recall_score(y_train_2,y_pred) print('After setting the optimal threshold, recall_sorce:', recall_scores)
The results are rather embarrassing. You can see from the picture above that our classification model is still quite good, so it cannot be optimized. Maybe, maybe.
Then I tried it again and it still had some effect.
- Or we use the oldest method and test every threshold
from sklearn.metrics import recall_score for threshold in thresholds: b_recall_scoress = recall_score(y_train_2, cross_val_predicts) y_pred = (cross_val_predicts > threshold) e_recall_score = recall_score(y_train_2, y_pred) max_good_threshold.append(threshold) beging_recalls.append(b_recall_scoress) end_recallss.append(e_recall_score) end_good_recalls_index=np.argmax(end_recallss) b_recalss=beging_recalls[end_good_recalls_index] e_threshold=max_good_threshold[end_good_recalls_index] e_reclass=end_recasses[end_good_recalls_index] print('optimal threshold',e_threshold) print('before optimization',b_reclasss) print('After optimization',e_reclass)
After threshold optimization, recall is still significantly improved.
5. Summary
In this section, we explain in detail several methods of model evaluation and obtain optimal thresholds to help us better evaluate and optimize the model.
I hope you can support me, I will work harder to learn and share more interesting things