11. Predict the pilling level of fabrics through six factors

1. Requirements analysis

According to the previously trained model, test the standard sample cards of different levels
There are 48 test samples, including number of pilling, total area of pilling, maximum area of pilling, average area of pilling, contrast, optical volume six indicators, and finally determine the Level
The general structure of the data set fiber.csv is as follows:
(The data set was collected by my own test, and I will not share it publicly here, personal data, long live understanding)

Notes on csv format:
N,S,Max_s,Aver_s,C,V,GradeThere is no space at the end
27,111542.5,38299.5,4131.2,31.91,3559537.61,1(space)There is a space after 1, pay attention! ! !

Variable	Meaning
N	Number of pilling
S	Total area of pilling
Max_s	Maximum Pilling Area
Aver_s	Average Pilling Area
C	Contrast
V	Optical Volume
Grade	Final rating level

2. Try multiple methods to achieve predictive rating

1. Guide package

pip install scikit-learn Install sklearn related packages

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
 
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
 
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import BernoulliNB
from sklearn.naive_bayes import GaussianNB
from sklearn.naive_bayes import MultinomialNB
from sklearn.svm import LinearSVC
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
 
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score

2. Read the display data set

fiber = pd.read_csv("./fiber.csv")
fiber. head(15)

print(fiber)
"""
     N S Max_s Aver_s C V Grade
0 27 111542.5 38299.5 4131.20 31.91 3559537.61 1
1 27 110579.5 31220.0 3186.63 31.28 2690869.73 1
?…
47 9 33853.0 6329.0 3761.44 41.17 1393863.42 4
"""

3. Divide the data set

The last column is the outcome, and the remaining six factors are independent variables

X = fiber.drop(['Grade'], axis=1)
Y = fiber['Grade']

Divide the dataset into two parts, validation set and test set
random_state random number seed, to ensure that the training set and test set are the same each time

X_train, X_test, y_train, y_test = train_test_split(X, Y, random_state=0)

Check the shape value
There are 36 training sets, 12 test sets, and a total of 48 data

print(X_test. shape) #(36, 6)
print(y_train. shape) #(36,)
print(X_test. shape) #(12, 6)

4. Different algorithm fitting

①K nearest neighbor algorithm, KNeighborsClassifier()

n_neighbors: Select the number of nearest points
Use these 4 data to fit other data

knn = KNeighborsClassifier(n_neighbors=4)

Train the fit on the training set

knn.fit(X_train,y_train)

Predict the test set X_test and get the prediction result y_pred

y_pred = knn. predict(X_test)

Compare the predicted result y_pred with the correct answer y_test, calculate the mean mean, and see the correct rate accuracy

accuracy = np.mean(y_pred==y_test)
print(accuracy)

Also see the final score

score = knn.score(X_test,y_test)
print(score)

Randomly generate a piece of data to test the model
16,18312.5,6614.5,2842.31,25.23,1147430.19,2
The final level is 2

test = np.array([[16,18312.5,6614.5,2842.31,25.23,1147430.19]])
prediction = knn. predict(test)
print(prediction)
"""
[2]
"""

This is extracted from the training set, and it must not be done in practice. It is just for testing.

Complete code of K nearest neighbor algorithm

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score


fiber = pd.read_csv("./fiber.csv")
# Divide independent and dependent variables
X = fiber. drop(['Grade'], axis=1)
Y = fiber['Grade']
# Divide the dataset
X_train, X_test, y_train, y_test = train_test_split(X, Y, random_state=0)

knn = KNeighborsClassifier(n_neighbors=4)
knn. fit(X_train, y_train)
y_pred = knn.predict(X_test)#model prediction result
accuracy = np.mean(y_pred==y_test)#accuracy
score = knn.score(X_test,y_test)#score
print(accuracy)
print(score)

#test
test = np.array([[16,18312.5,6614.5,2842.31,25.23,1147430.19]])#A random piece of data
prediction = knn.predict(test)#Bring in the data and predict it
print(prediction)

②Logistic regression algorithm, LogisticRegression()

Instantiate a logistic regression object

lr = LogisticRegression()

Pass in the training set for training fitting

lr.fit(X_train,y_train)#model fitting

Predict the test set X_test and get the prediction result y_pred

y_pred = lr.predict(X_test)#model prediction result

Compare the predicted result y_pred with the correct answer y_test, calculate the mean mean, and see the correct rate accuracy

accuracy = np.mean(y_pred==y_test)
print(accuracy)

Also see the final score

score = lr.score(X_test,y_test)
print(score)

Randomly generate a piece of data to test the model
20,44882.5,10563,5623.88,27.15,3053651.65,1
The final level is 1

test = np.array([[20,44882.5,10563,5623.88,27.15,3053651.65]])# Randomly find a piece of data, the correct level is 1
prediction = lr.predict(test)#Bring in the data and predict it
print(prediction)
"""
[1]
"""

This is extracted from the training set, and it must not be done in practice. It is just for testing.

Logistic regression complete code

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression


fiber = pd.read_csv("./fiber.csv")
# Divide independent and dependent variables
X = fiber. drop(['Grade'], axis=1)
Y = fiber['Grade']
# Divide the dataset
X_train, X_test, y_train, y_test = train_test_split(X, Y, random_state=0)

lr = LogisticRegression()
lr.fit(X_train,y_train)#model fitting
y_pred = lr.predict(X_test)#model prediction results
accuracy = np.mean(y_pred==y_test)#accuracy
score = lr.score(X_test,y_test)#score
print(accuracy)
print(score)

test = np.array([[20,44882.5,10563,5623.88,27.15,3053651.65]])#A random data
prediction = lr.predict(test)#Bring in the data and predict it
print(prediction)

③Linear support vector machine, LinearSVC()

Instantiate a linear SVM object

lsvc = LinearSVC()

Pass in the training set for training fitting

lsvc.fit(X_train,y_train)#model fitting

Predict the test set X_test and get the prediction result y_pred

y_pred = lsvc.predict(X_test)#model prediction result

Compare the predicted result y_pred with the correct answer y_test, calculate the mean mean, and see the correct rate accuracy

accuracy = np.mean(y_pred==y_test)
print(accuracy)

Also see the final score

score = lsvc.score(X_test,y_test)
print(score)

Randomly generate a piece of data to test the model
20,55997.5,17644.5,2799.88,8.58,480178.56,2
The final level is 2

test = np.array([[20,55997.5,17644.5,2799.88,8.58,480178.56]])#A random piece of data
prediction = lsvc.predict(test)#Bring in the data and predict it
print(prediction)
"""
[2]
"""

This is extracted from the training set, and it must not be done in practice. It is just for testing.

Complete code of linear support vector machine

from sklearn.svm import LinearSVC
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

fiber = pd.read_csv("./fiber.csv")
# Divide independent and dependent variables
X = fiber. drop(['Grade'], axis=1)
Y = fiber['Grade']
# Divide the dataset
X_train, X_test, y_train, y_test = train_test_split(X, Y, random_state=0)

lsvc = LinearSVC()
lsvc.fit(X_train,y_train)#model fitting
y_pred = lsvc.predict(X_test)#model prediction results
accuracy = np.mean(y_pred==y_test)#accuracy
score = lsvc.score(X_test,y_test)#score
print(accuracy)
print(score)

test = np.array([[20,55997.5,17644.5,2799.88,8.58,480178.56]])#A random piece of data
prediction = lsvc.predict(test)#Bring in the data and predict it
print(prediction)

④Support vector machine, SVC()

Instantiate the SVM object

svc = SVC()

Pass in the training set for training fitting

svc.fit(X_train,y_train)#model fitting

Predict the test set X_test and get the prediction result y_pred

y_pred = svc.predict(X_test)#model prediction results

Compare the predicted result y_pred with the correct answer y_test, calculate the mean mean, and see the correct rate accuracy

accuracy = np.mean(y_pred==y_test)
print(accuracy)

Also see the final score

score = svc.score(X_test,y_test)
print(score)

Randomly generate a piece of data to test the model
23,97215.5,22795.5,2613.09,29.72,1786141.62,1
The final level is 1

test = np.array([[23,97215.5,22795.5,2613.09,29.72,1786141.62]])#A random piece of data
prediction = svc.predict(test)#Bring in the data and predict it
print(prediction)
"""
[1]
"""

This is extracted from the training set, and it must not be done in practice. It is just for testing.

Complete code of support vector machine

from sklearn.svm import SVC
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

fiber = pd.read_csv("./fiber.csv")
# Divide independent and dependent variables
X = fiber. drop(['Grade'], axis=1)
Y = fiber['Grade']
# Divide the dataset
X_train, X_test, y_train, y_test = train_test_split(X, Y, random_state=0)

svc = SVC(gamma='auto')
svc.fit(X_train,y_train)#model fitting
y_pred = svc.predict(X_test)#model prediction result
accuracy = np.mean(y_pred==y_test)#accuracy
score = svc.score(X_test,y_test)#score
print(accuracy)
print(score)

test = np.array([[23,97215.5,22795.5,2613.09,29.72,1786141.62]])#A random piece of data
prediction = svc.predict(test)#Bring in the data and predict it
print(prediction)

⑤Decision tree, DecisionTreeClassifier()

Did you find out that the first four method steps are almost the same, but the instantiated objects are different, that’s all, so I won’t repeat them here.

Randomly generate a piece of data to test the model
11,99498,5369,9045.27,28.47,3827588.56,4
The final level is 4

Complete code of decision tree

from sklearn.tree import DecisionTreeClassifier
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

fiber = pd.read_csv("./fiber.csv")
# Divide independent and dependent variables
X = fiber. drop(['Grade'], axis=1)
Y = fiber['Grade']
# Divide the dataset
X_train, X_test, y_train, y_test = train_test_split(X, Y, random_state=0)

dtc = DecisionTreeClassifier()
dtc.fit(X_train,y_train)#model fitting
y_pred = dtc.predict(X_test)#model prediction results
accuracy = np.mean(y_pred==y_test)#accuracy
score = dtc.score(X_test,y_test)#score
print(accuracy)
print(score)

test = np.array([[11,99498,5369,9045.27,28.47,3827588.56]])#A random piece of data
prediction = dtc.predict(test)#Bring in the data and predict it
print(prediction)

⑥Gaussian Bayesian, GaussianNB()

Randomly generate a piece of data to test the model
14,160712,3208,3681.25,36.31,1871275.09,3
The final level is 3

Gaussian Bayes complete code

from sklearn.naive_bayes import GaussianNB
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

fiber = pd.read_csv("./fiber.csv")
# Divide independent and dependent variables
X = fiber. drop(['Grade'], axis=1)
Y = fiber['Grade']
# Divide the dataset
X_train, X_test, y_train, y_test = train_test_split(X, Y, random_state=0)

gnb = GaussianNB()
gnb.fit(X_train,y_train)#model fitting
y_pred = gnb.predict(X_test)#model prediction result
accuracy = np.mean(y_pred==y_test)#accuracy
score = gnb.score(X_test,y_test)#score
print(accuracy)
print(score)

test = np.array([[14,160712,3208,3681.25,36.31,1871275.09]])#A random piece of data
prediction = gnb.predict(test)#Bring in the data and predict it
print(prediction)

⑦Bernoulli Bayes, BernoulliNB()

Randomly generate a piece of data to test the model
18,57541.5,10455,2843.36,30.68,1570013.02,2
The final level is 2

Bernoulli Bayes complete code

from sklearn.naive_bayes import BernoulliNB
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

fiber = pd.read_csv("./fiber.csv")
# Divide independent and dependent variables
X = fiber. drop(['Grade'], axis=1)
Y = fiber['Grade']
# Divide the dataset
X_train, X_test, y_train, y_test = train_test_split(X, Y, random_state=0)

bnb = BernoulliNB()
bnb.fit(X_train,y_train)#model fitting
y_pred = bnb.predict(X_test)#model prediction results
accuracy = np.mean(y_pred==y_test)#accuracy
score = bnb.score(X_test,y_test)#score
print(accuracy)
print(score)

test = np.array([[18,57541.5,10455,2843.36,30.68,1570013.02]])#A random piece of data
prediction = bnb.predict(test)#Bring in the data and predict it
print(prediction)

⑧Multinomial Bayesian, MultinomialNB()

Randomly generate a piece of data to test the model
9,64794,5560,10682.94,38.99,3748367.45,4
The final level is 4

Complete code for polynomial Bayes

from sklearn.naive_bayes import MultinomialNB
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

fiber = pd.read_csv("./fiber.csv")
# Divide independent and dependent variables
X = fiber. drop(['Grade'], axis=1)
Y = fiber['Grade']
# Divide the dataset
X_train, X_test, y_train, y_test = train_test_split(X, Y, random_state=0)

mnb = MultinomialNB()
mnb.fit(X_train,y_train)#model fitting
y_pred = mnb.predict(X_test)#model prediction result
accuracy = np.mean(y_pred==y_test)#accuracy
score = mnb.score(X_test,y_test)#score
print(accuracy)
print(score)

test = np.array([[9,64794,5560,10682.94,38.99,3748367.45]])#A random piece of data
prediction = mnb.predict(test)#Bring in the data and predict it
print(prediction)

Finally, by adjusting parameters and optimizing, it is determined to use the decision tree to predict the grade of this sample

5. Model saving and loading

Here we take the decision tree algorithm as an example

The model after training is saved by joblib.dump(dtc, './dtc.model')
dtc instantiates objects for the model
./dtc.model is to save the model name and path

Load the model via dtc_yy = joblib.load('./dtc.model')

full code

from sklearn.tree import DecisionTreeClassifier
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import joblib

fiber = pd.read_csv("./fiber.csv")
# Divide independent and dependent variables
X = fiber. drop(['Grade'], axis=1)
Y = fiber['Grade']
# Divide the dataset
X_train, X_test, y_train, y_test = train_test_split(X, Y, random_state=0)

dtc = DecisionTreeClassifier()
dtc.fit(X_train,y_train)#model fitting
joblib.dump(dtc, './dtc.model')#Save the model
y_pred = dtc.predict(X_test)#model prediction result
accuracy = np.mean(y_pred==y_test)#accuracy
score = dtc.score(X_test,y_test)#score
print(accuracy)
print(score)


dtc_yy = joblib.load('./dtc.model')
test = np.array([[11,99498,5369,9045.27,28.47,3827588.56]])#A random piece of data
prediction = dtc_yy.predict(test)#Bring in the data and predict it
print(prediction)

The saved model is as follows: