Table of Contents
fitting
Underfitting
overfitting
correct fit
Methods to solve overfitting: regularization
Both linear regression models and logistic regression models suffer from underfitting and overfitting.
Fitting
Explanation from Baidu:
Data fitting, also known as curve fitting, commonly known as curve drawing, is a method of substituting existing data into a mathematical expression through mathematical methods. Scientific and engineering problems can obtain a certain amount of discrete data through methods such as sampling and experiments. Based on these data, we often hope to obtain a continuous function (that is, a curve) or a more dense discrete equation that is consistent with the known data. This process It’s called fitting.
Personal understanding is that fitting is a mathematical model established based on existing data. This data model can contain the existing data to the maximum extent. In this way, the predicted data can be consistent with the existing situation to the greatest extent.
Underfitting
The established model has a low degree of matching with the existing data. As shown in the figure below, the decision boundary cannot distinguish the current data very well.
Underfitting occurs when the training data has fewer eigenvalues.
Overfitting
The model matches the existing data too well, resulting in the model being unable to be generalized to more data. This situation occurs when the training data has too many feature values.
Correct Fit
Between underfitting and overfitting
Method to solve over-fitting: regularization
The way to solve overfitting is to regularize the model, that is, adjust w_j, which is not the main feature, to be infinitely close to 0, and then train the model to find the optimal model. There is a problem, how to distinguish whether the feature is the main feature? This is difficult to distinguish, so all features are regularized. The regularization formula is:
Linear regression cost function:
Logistic regression cost function:
Gradient descent function for linear regression and logistic regression:
Implementation code:
import numpy as np %matplotlib inline import matplotlib.pyplot as plt from plt_overfit import overfit_example, output np.set_printoptions(precision=8) def sigmoid(z): """ Compute the sigmoid of z Args: z (ndarray): A scalar, numpy array of any size. Returns: g (ndarray): sigmoid(z), with the same shape as z """ g = 1/(1 + np.exp(-z)) return g def compute_cost_linear_reg(X, y, w, b, lambda_ = 1): """ Computes the cost over all examples Args: X (ndarray (m,n): Data, m examples with n features y (ndarray (m,)): target values w (ndarray (n,)): model parameters b (scalar) : model parameter lambda_ (scalar): Controls amount of regularization Returns: total_cost (scalar): cost """ m = X.shape[0] n = len(w) cost = 0. for i in range(m): f_wb_i = np.dot(X[i], w) + b #(n,)(n,)=scalar, see np.dot cost = cost + (f_wb_i - y[i])**2 #scalar cost = cost / (2 * m) #scalar reg_cost = 0 for j in range(n): reg_cost + = (w[j]**2) #scalar reg_cost = (lambda_/(2*m)) * reg_cost #scalar total_cost = cost + reg_cost #scalar return total_cost #scalar np.random.seed(1) X_tmp = np.random.rand(5,6) y_tmp = np.array([0,1,0,1,0]) w_tmp = np.random.rand(X_tmp.shape[1]).reshape(-1,)-0.5 b_tmp = 0.5 lambda_tmp = 0.7 cost_tmp = compute_cost_linear_reg(X_tmp, y_tmp, w_tmp, b_tmp, lambda_tmp) print("Regularized cost:", cost_tmp) def compute_cost_logistic_reg(X, y, w, b, lambda_ = 1): """ Computes the cost over all examples Args: Args: X (ndarray (m,n): Data, m examples with n features y (ndarray (m,)): target values w (ndarray (n,)): model parameters b (scalar) : model parameter lambda_ (scalar): Controls amount of regularization Returns: total_cost (scalar): cost """ m,n = X.shape cost = 0. for i in range(m): z_i = np.dot(X[i], w) + b #(n,)(n,)=scalar, see np.dot f_wb_i = sigmoid(z_i) #scalar cost + = -y[i]*np.log(f_wb_i) - (1-y[i])*np.log(1-f_wb_i) #scalar cost = cost/m #scalar reg_cost = 0 for j in range(n): reg_cost + = (w[j]**2) #scalar reg_cost = (lambda_/(2*m)) * reg_cost #scalar total_cost = cost + reg_cost #scalar return total_cost #scalar np.random.seed(1) X_tmp = np.random.rand(5,6) y_tmp = np.array([0,1,0,1,0]) w_tmp = np.random.rand(X_tmp.shape[1]).reshape(-1,)-0.5 b_tmp = 0.5 lambda_tmp = 0.7 cost_tmp = compute_cost_logistic_reg(X_tmp, y_tmp, w_tmp, b_tmp, lambda_tmp) print("Regularized cost:", cost_tmp) def compute_gradient_linear_reg(X, y, w, b, lambda_): """ Computes the gradient for linear regression Args: X (ndarray (m,n): Data, m examples with n features y (ndarray (m,)): target values w (ndarray (n,)): model parameters b (scalar) : model parameter lambda_ (scalar): Controls amount of regularization Returns: dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w. dj_db (scalar): The gradient of the cost w.r.t. the parameter b. """ m,n = X.shape #(number of examples, number of features) dj_dw = np.zeros((n,)) dj_db = 0. for i in range(m): err = (np.dot(X[i], w) + b) - y[i] for j in range(n): dj_dw[j] = dj_dw[j] + err * X[i, j] dj_db = dj_db + err dj_dw = dj_dw/m dj_db = dj_db/m for j in range(n): dj_dw[j] = dj_dw[j] + (lambda_/m) * w[j] return dj_db, dj_dw np.random.seed(1) X_tmp = np.random.rand(5,3) y_tmp = np.array([0,1,0,1,0]) w_tmp = np.random.rand(X_tmp.shape[1]) b_tmp = 0.5 lambda_tmp = 0.7 dj_db_tmp, dj_dw_tmp = compute_gradient_linear_reg(X_tmp, y_tmp, w_tmp, b_tmp, lambda_tmp) print(f"dj_db: {dj_db_tmp}", ) print(f"Regularized dj_dw:\\ {dj_dw_tmp.tolist()}", ) def compute_gradient_logistic_reg(X, y, w, b, lambda_): """ Computes the gradient for linear regression Args: X (ndarray (m,n): Data, m examples with n features y (ndarray (m,)): target values w (ndarray (n,)): model parameters b (scalar) : model parameter lambda_ (scalar): Controls amount of regularization Returns dj_dw (ndarray Shape (n,)): The gradient of the cost w.r.t. the parameters w. dj_db (scalar) : The gradient of the cost w.r.t. the parameter b. """ m,n = X.shape dj_dw = np.zeros((n,)) #(n,) dj_db = 0.0 #scalar for i in range(m): f_wb_i = sigmoid(np.dot(X[i],w) + b) #(n,)(n,)=scalar err_i = f_wb_i - y[i] #scalar for j in range(n): dj_dw[j] = dj_dw[j] + err_i * X[i,j] #scalar dj_db = dj_db + err_i dj_dw = dj_dw/m #(n,) dj_db = dj_db/m #scalar for j in range(n): dj_dw[j] = dj_dw[j] + (lambda_/m) * w[j] return dj_db, dj_dw np.random.seed(1) X_tmp = np.random.rand(5,3) y_tmp = np.array([0,1,0,1,0]) w_tmp = np.random.rand(X_tmp.shape[1]) b_tmp = 0.5 lambda_tmp = 0.7 dj_db_tmp, dj_dw_tmp = compute_gradient_logistic_reg(X_tmp, y_tmp, w_tmp, b_tmp, lambda_tmp) print(f"dj_db: {dj_db_tmp}", ) print(f"Regularized dj_dw:\\ {dj_dw_tmp.tolist()}", ) plt.close("all") display(output) ofit = overfit_example(True)
The logistic regression output is: