Image Classification Based on Bayes, Decision Tree and SVM

Image classification based on Bayes, decision tree and SVM

  • 1. Dataset description
  • Second, use Bayes for image classification
  • 3. Image Classification Using Decision Trees
  • Fourth, use SVM for image classification
  • 5. Comparison of Classification Algorithms

1. Dataset description

This experimental data set is a garbage classification picture data set, which contains six categories: cardboard, glass, metal, paper, plastic, and trash. They are named 0, 1, 2, 3, 4, and 5 in the folder respectively. The data is divided into training set and test set.

(1) Read the data set, use the loop to read the pictures sequentially, and use the cv2.imread() method to open the pictures. The python code is as follows:

X = []
Y = []
# file = 'Garbage_classification'
for i in range(0, 6):
# traverse folders, read pictures
for f in os.listdir("Garbage_classification/%s" % i):
# Open an image and grayscale it
Images = cv2.imread("Garbage_classification/%s/%s" % (i, f))
image = cv2.resize(Images, (256, 256), interpolation=cv2.INTER_CUBIC)
hist = cv2.calcHist([image], [0, 1], None, [256, 256], [0.0, 255.0, 0.0, 255.0])
X.append(((hist / 255).flatten()))
Y.append(i)
X = np.array(X)
Y = np.array(Y)

(2) Use the train_test_split() method to divide the data set according to the training set accounting for 70% and the testing set accounting for 30%.

 # split training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=1)

2. Image classification using Bayes

Bayesian algorithm (Bayesian algorithm) uses Bayesian theorem to reason about known data, and then perform tasks such as classification and regression. In classification problems, Bayesian algorithms build a model based on training data that is able to calculate the probability that a given input vector belongs to each class. The advantage of the Bayesian algorithm is that it can use prior knowledge to improve classification accuracy, and it can also produce good results when the sample data is small.
The Python code is implemented as follows:

 # bayes image classification
clf = BernoulliNB(alpha=6, binarize=0)
clf.fit(X_train, y_train) #training fitting data
predictions_labels = clf.predict(X_test) # get predicted data
score = clf.score(X_test,y_test) #calculate accuracy
print('accuracy rate: {}'.format(score*100))
# Calculate the kappa coefficient
kappa_value = cohen_kappa_score(y_test, predictions_labels)
print('kappa coefficient is: {}'.format(kappa_value))
print(u'algorithm evaluation:')
print(classification_report(y_test, predictions_labels))
#start painting confusion proof
cm = confusion_matrix(y_test,predictions_labels)
cm = pd.DataFrame(cm,columns=["cardboard","glass","metal","paper","plastic","trash"],index= ["cardboard","glass","metal","paper","plastic","trash"]) # Specify the name of the confusion proof line
plt.figure(dpi=200, figsize=(5, 5)) # set image size and size
sns.heatmap(cm,cmap="YlGnBu_r",fmt="d",annot=True) #seaborn drawing

Use precision, recall rate and F1, confusion matrix, kappa coefficient and accuracy rate to judge the classification accuracy. The specific data are shown in the following table:

The kappa coefficient is: 0.404, and the accuracy rate is: 52.04%.

The confusion matrix is shown in the figure below:

3. Image classification using decision trees

Decision Tree algorithm is a method to approximate the value of discrete functions. It is a typical classification method that first processes the data, uses inductive algorithms to generate readable rules and decision trees, and then uses the decisions to analyze new data. In essence, a decision tree is the process of classifying data through a series of rules. The algorithm of decision tree learning is usually a process of recursively selecting the optimal feature, and dividing the training data according to the feature, so that each sub-data set has the best classification process. It includes feature selection, generation of decision tree and pruning process of decision tree.
The Python code is implemented as follows:

 # Image classification processing based on decision tree
clf = DecisionTreeClassifier(max_depth=11, max_features=83, min_samples_leaf=2, min_samples_split=12)
\t
clf.fit(X_train, y_train) #training fitting data
predictions_labels = clf.predict(X_test) # get predicted data
score = clf.score(X_test,y_test) #calculate accuracy
\t  
print('accuracy rate: {}'.format(score*100))
\t
# Calculate the kappa coefficient
kappa_value = cohen_kappa_score(y_test, predictions_labels)
print('kappa coefficient is: {}'.format(kappa_value))
print('algorithm evaluation:{}'.format(classification_report(y_test, predictions_labels)))
\t 
#start painting confusion proof
cm = confusion_matrix(y_test,predictions_labels)
cm = pd.DataFrame(cm,columns=["cardboard","glass","metal","paper","plastic","trash"],index= ["cardboard","glass","metal","paper","plastic","trash"]) # Specify the name of the confusion proof line
plt.figure(dpi=200, figsize=(5, 5)) # set image size and size
sns.heatmap(cm,cmap="YlGnBu_r",fmt="d",annot=True) #seaborn drawing

Use precision, recall rate and F1, confusion matrix, kappa coefficient and accuracy rate to judge the classification accuracy. The specific data are shown in the following table:

Kappa coefficient: 0.406, accuracy rate: 52.17%.
The confusion matrix is shown in the figure below:

4. Using SVM for image classification

SVM is the abbreviation of Support Vector Machine (Support Vector Machine), which is a commonly used supervised learning algorithm that can be used for classification and regression problems. In the classification problem, SVM will map samples of different categories into a high-dimensional space, and find a hyperplane in this space, so that samples of different categories can be separated to the maximum extent. The advantage of the SVM algorithm is that it can handle high-dimensional data and can handle nonlinear classification problems. The SVM algorithm realizes the processing of high-dimensional data through the kernel function, and can adapt to different data distributions by adjusting the parameters of the kernel function.
The Python code is implemented as follows:

 # Image classification processing based on support vector machine
from sklearn import svm
clf = svm.SVC()
clf.fit(X_train, y_train) #training fitting data
predictions_labels = clf.predict(X_test) # get predicted data
score = clf.score(X_test,y_test) #calculate accuracy
print('accuracy rate: {}'.format(score*100))
# Calculate the kappa coefficient
kappa_value = cohen_kappa_score(y_test, predictions_labels)
print('kappa coefficient is: {}'.format(kappa_value))
print('algorithm evaluation:{}'.format(classification_report(y_test, predictions_labels)))
#start painting confusion proof
cm = confusion_matrix(y_test,predictions_labels)
cm = pd.DataFrame(cm,columns=["cardboard","glass","metal","paper","plastic","trash"],index= ["cardboard","glass","metal","paper","plastic","trash"]) # Specify the name of the confusion proof line
plt.figure(dpi=200, figsize=(5, 5)) # set image size and size

Use precision, recall rate and F1, confusion matrix, kappa coefficient and accuracy rate to judge the classification accuracy. The specific data are shown in the following table:

The kappa coefficient is: 0.577, and the accuracy rate is: 66.13%.
The confusion matrix is shown in the figure below:

5. Comparison of classification algorithms

Bayes, decision tree, and SVM use precision (Presicion) to classify and compare. The specific data are shown in the following table:

Due to the large number of classification categories and the limited number of data sets, the major classifiers do not show high accuracy in the classification effect, especially in the classification of trash, the accuracy of decision tree and Bayes classification is low . Among the three types of algorithms, Bayes is the fastest. From the accuracy comparison of the data, it can be seen that in this project, SVM performed the best.