[Machine Learning][Part 6]Cost Function cost function and gradient regularization

Table of Contents

fitting

Underfitting

overfitting

correct fit

Methods to solve overfitting: regularization

Both linear regression models and logistic regression models suffer from underfitting and overfitting.

Fitting

Explanation from Baidu:

Data fitting, also known as curve fitting, commonly known as curve drawing, is a method of substituting existing data into a mathematical expression through mathematical methods. Scientific and engineering problems can obtain a certain amount of discrete data through methods such as sampling and experiments. Based on these data, we often hope to obtain a continuous function (that is, a curve) or a more dense discrete equation that is consistent with the known data. This process It’s called fitting.

Personal understanding is that fitting is a mathematical model established based on existing data. This data model can contain the existing data to the maximum extent. In this way, the predicted data can be consistent with the existing situation to the greatest extent.

Underfitting

The established model has a low degree of matching with the existing data. As shown in the figure below, the decision boundary cannot distinguish the current data very well.

Underfitting occurs when the training data has fewer eigenvalues.

Overfitting

The model matches the existing data too well, resulting in the model being unable to be generalized to more data. This situation occurs when the training data has too many feature values.

Correct Fit

Between underfitting and overfitting

Method to solve over-fitting: regularization

The way to solve overfitting is to regularize the model, that is, adjust w_j, which is not the main feature, to be infinitely close to 0, and then train the model to find the optimal model. There is a problem, how to distinguish whether the feature is the main feature? This is difficult to distinguish, so all features are regularized. The regularization formula is:

Linear regression cost function:

Logistic regression cost function:

Gradient descent function for linear regression and logistic regression:

Implementation code:

import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
from plt_overfit import overfit_example, output

np.set_printoptions(precision=8)

def sigmoid(z):
    """
    Compute the sigmoid of z

    Args:
        z (ndarray): A scalar, numpy array of any size.

    Returns:
        g (ndarray): sigmoid(z), with the same shape as z

    """
    g = 1/(1 + np.exp(-z))
    return g

def compute_cost_linear_reg(X, y, w, b, lambda_ = 1):
    """
    Computes the cost over all examples
    Args:
      X (ndarray (m,n): Data, m examples with n features
      y (ndarray (m,)): target values
      w (ndarray (n,)): model parameters
      b (scalar) : model parameter
      lambda_ (scalar): Controls amount of regularization
    Returns:
      total_cost (scalar): cost
    """

    m = X.shape[0]
    n = len(w)
    cost = 0.
    for i in range(m):
        f_wb_i = np.dot(X[i], w) + b #(n,)(n,)=scalar, see np.dot
        cost = cost + (f_wb_i - y[i])**2 #scalar
    cost = cost / (2 * m) #scalar
 
    reg_cost = 0
    for j in range(n):
        reg_cost + = (w[j]**2) #scalar
    reg_cost = (lambda_/(2*m)) * reg_cost #scalar
    
    total_cost = cost + reg_cost #scalar
    return total_cost #scalar


np.random.seed(1)
X_tmp = np.random.rand(5,6)
y_tmp = np.array([0,1,0,1,0])
w_tmp = np.random.rand(X_tmp.shape[1]).reshape(-1,)-0.5
b_tmp = 0.5
lambda_tmp = 0.7
cost_tmp = compute_cost_linear_reg(X_tmp, y_tmp, w_tmp, b_tmp, lambda_tmp)

print("Regularized cost:", cost_tmp)



def compute_cost_logistic_reg(X, y, w, b, lambda_ = 1):
    """
    Computes the cost over all examples
    Args:
    Args:
      X (ndarray (m,n): Data, m examples with n features
      y (ndarray (m,)): target values
      w (ndarray (n,)): model parameters
      b (scalar) : model parameter
      lambda_ (scalar): Controls amount of regularization
    Returns:
      total_cost (scalar): cost
    """

    m,n = X.shape
    cost = 0.
    for i in range(m):
        z_i = np.dot(X[i], w) + b #(n,)(n,)=scalar, see np.dot
        f_wb_i = sigmoid(z_i) #scalar
        cost + = -y[i]*np.log(f_wb_i) - (1-y[i])*np.log(1-f_wb_i) #scalar
             
    cost = cost/m #scalar

    reg_cost = 0
    for j in range(n):
        reg_cost + = (w[j]**2) #scalar
    reg_cost = (lambda_/(2*m)) * reg_cost #scalar
    
    total_cost = cost + reg_cost #scalar
    return total_cost #scalar



np.random.seed(1)
X_tmp = np.random.rand(5,6)
y_tmp = np.array([0,1,0,1,0])
w_tmp = np.random.rand(X_tmp.shape[1]).reshape(-1,)-0.5
b_tmp = 0.5
lambda_tmp = 0.7
cost_tmp = compute_cost_logistic_reg(X_tmp, y_tmp, w_tmp, b_tmp, lambda_tmp)

print("Regularized cost:", cost_tmp)


def compute_gradient_linear_reg(X, y, w, b, lambda_):
    """
    Computes the gradient for linear regression
    Args:
      X (ndarray (m,n): Data, m examples with n features
      y (ndarray (m,)): target values
      w (ndarray (n,)): model parameters
      b (scalar) : model parameter
      lambda_ (scalar): Controls amount of regularization
      
    Returns:
      dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w.
      dj_db (scalar): The gradient of the cost w.r.t. the parameter b.
    """
    m,n = X.shape #(number of examples, number of features)
    dj_dw = np.zeros((n,))
    dj_db = 0.

    for i in range(m):
        err = (np.dot(X[i], w) + b) - y[i]
        for j in range(n):
            dj_dw[j] = dj_dw[j] + err * X[i, j]
        dj_db = dj_db + err
    dj_dw = dj_dw/m
    dj_db = dj_db/m
    
    for j in range(n):
        dj_dw[j] = dj_dw[j] + (lambda_/m) * w[j]

    return dj_db, dj_dw

np.random.seed(1)
X_tmp = np.random.rand(5,3)
y_tmp = np.array([0,1,0,1,0])
w_tmp = np.random.rand(X_tmp.shape[1])
b_tmp = 0.5
lambda_tmp = 0.7
dj_db_tmp, dj_dw_tmp = compute_gradient_linear_reg(X_tmp, y_tmp, w_tmp, b_tmp, lambda_tmp)

print(f"dj_db: {dj_db_tmp}", )
print(f"Regularized dj_dw:\\
 {dj_dw_tmp.tolist()}", )



def compute_gradient_logistic_reg(X, y, w, b, lambda_):
    """
    Computes the gradient for linear regression
 
    Args:
      X (ndarray (m,n): Data, m examples with n features
      y (ndarray (m,)): target values
      w (ndarray (n,)): model parameters
      b (scalar) : model parameter
      lambda_ (scalar): Controls amount of regularization
    Returns
      dj_dw (ndarray Shape (n,)): The gradient of the cost w.r.t. the parameters w.
      dj_db (scalar) : The gradient of the cost w.r.t. the parameter b.
    """
    m,n = X.shape
    dj_dw = np.zeros((n,)) #(n,)
    dj_db = 0.0 #scalar

    for i in range(m):
        f_wb_i = sigmoid(np.dot(X[i],w) + b) #(n,)(n,)=scalar
        err_i = f_wb_i - y[i] #scalar
        for j in range(n):
            dj_dw[j] = dj_dw[j] + err_i * X[i,j] #scalar
        dj_db = dj_db + err_i
    dj_dw = dj_dw/m #(n,)
    dj_db = dj_db/m #scalar

    for j in range(n):
        dj_dw[j] = dj_dw[j] + (lambda_/m) * w[j]

    return dj_db, dj_dw


np.random.seed(1)
X_tmp = np.random.rand(5,3)
y_tmp = np.array([0,1,0,1,0])
w_tmp = np.random.rand(X_tmp.shape[1])
b_tmp = 0.5
lambda_tmp = 0.7
dj_db_tmp, dj_dw_tmp = compute_gradient_logistic_reg(X_tmp, y_tmp, w_tmp, b_tmp, lambda_tmp)

print(f"dj_db: {dj_db_tmp}", )
print(f"Regularized dj_dw:\\
 {dj_dw_tmp.tolist()}", )


plt.close("all")
display(output)
ofit = overfit_example(True)

The logistic regression output is: