logistic and softmax linear classification
logistic regression
Hyper plane (Hyper plane) wt x + b = 0 divides the feature space into two parts, one part wt x + b>0 is a positive half space, and the other One copy is negative half space.
Among them, w is the normal vector of the hyperplane, and the general modulus length is 1, which determines the direction of the hyperplane; b is the offset value, which determines the distance between the hyperplane and the origin. Use gradient descent to update w and b.
Hyperplane equation: wx + b = 0; distance from x to hyperplane: |wt x + b|
softmax function
The normalized exponential function displays the results of multi-classification in the form of probability. It uses a matrix to process the eigenvalues, and finally gets the probability of the eigenvalues on each label.
w
=
∣
w
1
,
1
w
1
,
2
?
w
1
,
no
w
2
,
1
w
2
,
2
?
w
2
,
no
?
?
?
?
w
m
,
1
w
m
,
2
?
w
m
,
no
∣
w
T
x
i
=
∣
z
1
=
w
1
T
x
i
z
2
=
w
2
T
x
i
?
z
no
=
w
no
T
x
i
∣
w = \begin{vmatrix} w_{1,1} & amp; w_{1,2} & amp; \cdots & amp; w_{1,n}\ w_{2,1} & amp; w_{2 ,2} & amp; \cdots & amp; w_{2,n}\ \vdots & amp; \vdots & amp; \ddots & amp; \vdots\ w_{m,1} & amp; w_{m ,2} & amp; \cdots & amp; w_{m,n}\ \end{vmatrix} \ w^T x_i = \begin{vmatrix} z_1 = w_1^Tx_i\ z_2 = w_2^Tx_i\ \cdots\ z_n = w_n^Tx_i\ \end{vmatrix}
w=
the y
k
i
=
p
(
the y
=
k
∣
x
i
)
=
e
x
p
(
z
k
)
∑
j
=
1
k
e
x
p
(
z
j
)
Y
=
∣
the y
1
the y
2
?
the y
no
∣
y_k^i = p(y=k|x^i) = {exp(z_k)\over{∑_{j=1}^k exp(z_j)}} \ Y = \begin{vmatrix} y_1\ y_2\ \cdots\ y_n \end{vmatrix}
yki?=p(y=k∣xi)=∑j=1k?exp(zj?)exp(zk?)?Y=
The benefits of this are:
1) The predicted probability is non-negative: use the exp exponential function
2) The sum of the probabilities of various forecast results is equal to 1: the single item is divided by the sum of the items
Experiment
1. Load the Iris dataset
In order to train the model, the processing of the data set is skipped, and it is necessary to divide the training set and the verification set.
# -*- coding: utf-8 -*- # Import the datasets package from sklearn import datasets # Load the iris dataset iris = datasets. load_iris() # Data: sepal length, width, petal length, width xi = iris.data yi = iris.target # label target_names = iris.target_names #species
2. Logistic regression
Based on numpy, a logistic-based multi-class classifier is implemented to solve the classification problem of “iris flower data”. Since logistic regression can only be used for binary classification, it is necessary to train three binary classifiers and synthesize the results of the three binary classifiers.
Font metrics not found for font: .
Define three logistic regression models, each distinguishing between 01, 02 and 12 labels:
1) If the sample label is 0: model 0 predicts 0, model 1 predicts 0, model 2 predicts 1 or 2;
2) If the sample label is 1: model 0 predicts 1, model 1 predicts 0 or 2, model 2 predicts 1;
3) If the sample label is 2: model 0 predicts 0 or 1, model 1 predicts 2, and model 2 predicts 2.
Therefore, it is only necessary to combine the prediction results of the three models, and the labels that appear most will be used as samples.
# random initialization model = np.random.random(size = xi_0.shape[1]) # Model parameter transposition wt = model.transpose() # Encapsulate the calculation of predicted probability as a function # x prediction sample, wt model parameter transposition def predict(x, wt): # Multiply the feature matrix with wt to get the prediction vector target_result = np.matmul(x,wt) # Use the sigmoid activation function to map to the 0-1 interval p = [] for i in range(len(target_result)): p.append(1/(1 + np.exp(-target_result[i]))) return p # learning rate a = 0.01 # Update the model weights once def update(wt, p, xi, y ): wt = wt - a* 2/np.array(xi).shape[0]* np.matmul(np.array(xi).transpose(),(np.array(p) - y)) return wt # x training samples, y sample labels, epoch training times def model_train(x, y, epoch): # Randomly initialize the model model = np. random. random(size = xi. shape[1]) # Model parameter transposition wt = model.transpose() # for training for i in range(epoch): # Make predictions based on current parameters p = predict(x,wt) # gradient update wt = update(wt, p, x, y) return wt
3, softmax function
Given a sample, update w in units of vectors. If the given sample label is 0:
Font metrics not found for font: .
import math # Calculate a 3x1 vector, corresponding to the probability of each category def predict_2(wt, x): # 1/(1 + np.exp(-target_result[i])) vec = np.matmul(wt,x) # Convert calculation results to probabilities for each category y = [] for i in range(len(vec)): # exp function y.append(math.exp(vec[i])/(math.exp(vec[0]) + math.exp(vec[1]) + math.exp(vec[2]))) return y # wt is the transpose of the weight matrix, y is the predicted value, xi is a single sample feature, and yi is the category of the sample def update_2(wt, y, xi, yi): a = 0.01 # The value of yi is 0, 1, 2, and the column vector corresponding to the list is processed wt[yi] = wt[yi] - a*(y[yi] - 1)*xi # Process column vectors that do not correspond to categories for i in range(3): if i != yi: wt[i] = wt[i] - a*y[i]*xi return wt def model_train_2(epoch): # Create a random (4,3) matrix w = np.random.random(size=(4,3)) # Matrix transpose wt = w.transpose() # training batches # print(yi) for i in range(epoch): # Use all samples for training once per round for j in range(len(xi)): y = predict_2(wt, xi[j].transpose()) wt = update_2(wt, y, xi[j].transpose(),yi[j]) return wt.transpose()
4. Visualization
-
For the visualization of experimental data, since the feature dimension has 4 dimensions, PCA dimensionality reduction is performed on the data first, and different colors are drawn according to the sample labels.
For the visualization of model prediction results, after feature input, the prediction results are collected as a list, and the data_view() method can also be called for visualization.
# Import pandas library, data visualization import pandas as pd import matplotlib.pyplot as plt # import numpy package import numpy as np # PCA dimensionality reduction from sklearn.decomposition import PCA # Load PCA algorithm package pca = PCA(n_components=2)# Load the PCA algorithm, set the number of principal components after dimension reduction to 2 reduced_x = pca.fit_transform(xi)# Reduce the dimension of the sample # Enter the label of the predicted sample and draw the image def data_view(target): # Draw scatterplots in batches color=['red','green','blue']# color # Prevent the incoming target from being a list target = np.array(target) for i in range(3): plt.scatter(reduced_x[target==i][:,0],reduced_x[target==i][:,1], c=color[i], label=target_names[i]) # place the label plt. legend(loc = 'best') # add title plt.suptitle("iris data") # display image plt. show() data_view(yi)
5. Existing problems
No training set, test set, and validation set classification was performed.