week2_lab2 Gradient Descent for Linear Regression

week2_lab2 Gradient Descent for Linear Regression (Boston Housing dataset) (study notes)

In this exercise, you will learn the following

  1. implement the gradient descent method
  2. implement the minibatch gradient descent method

We will use the Boston Housing data, similar to Week 1. We can import the dataset and preprocess it as follows. Note we add a feature of to x_input to get a n x (d + 1) matrix x_in

import pandas as pd
import numpy as np
data_url = "http://lib.stat.cmu.edu/datasets/boston"
raw_df = pd.read_csv(data_url, sep="\s + ", skiprows=22, header=None)
boston_data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
target = raw_df.values[1::2, 2]

data = boston_data;
x_input = data # a data matrix
y_target = target; # a vector for all outputs
# add a feature 1 to the dataset, then we do not need to consider the bias and weight separately
x_in = np.concatenate([np.ones([np.shape(x_input)[0], 1]), x_input], axis=1)
# we normalize the data so that each has regularity
x_in = preprocessing.normalize(x_in)

x_in = np.concatenate([np.ones([np.shape(x_input)[0], 1]), x_input], axis=1): This line is used to add an extra feature (constant 1) to the input feature Matrix x_input front, this is so that the bias term (bias) and weight term (weights) do not need to be considered separately. The specific explanation is as follows:

  1. np.ones([np.shape(x_input)[0], 1]) creates a matrix of shape (n, 1) , where n is the number of examples and the column is 1 for each example.
  2. x_input is the original input feature matrix with shape (n, d), where n is the number of examples and d is the number of features.
  3. np.concatenate(…, axis=1) is used to concatenate these two matrices by column (axis=1) to form a new feature matrix x_in, where the first column is 1, indicating the constant term, followed by the original features .

Linear Model & amp; Cost Function

def linearmat_2(w, X):
    '''
    a vectorization of linearmat_1 in Week 1 lab.
    Input: w is a weight parameter (including the bias), and X is a data matrix (n x (d + 1)) (including the feature)
    Output: a vector containing the predictions of linear models
    '''
    return np.dot(X, w)
def cost(w, X, y):
    '''
    Evaluate the cost function in a vectorized manner
    inputs `X` and outputs `y`, at weights `w`.
    '''
    residual = y - linearmat_2(w, X) # get the residual
    err = np.dot(residual, residual) / (2 * len(y)) # compute the error

    return err

Gradient Computation

Calculate the gradient of the cost function

# Vectorized gradient function
def gradfn(weights, X, y):
    '''
    Given `weights` - a current "Guess" of what our weights should be
          `X` - matrix of shape (N,d + 1) of input features including the feature $1$
          `y` - target y values
    Return gradient of each weight evaluated at the current value
    '''

    y_pred = np.dot(X, weights)
    error = y_pred - y
    return np.dot(X.T, error) / len(y)

Gradient Descent

Use calculated gradients for gradient descent

def solve_via_gradient_descent(X, y, print_every=100,
                               niter=5000, eta=1):
    '''
    Given `X` - matrix of shape (N,D) of input features
          `y` - target y values
          `print_every` - we report performance every 'print_every' iterations
          `niter` - the number of iterates allowed
          `eta` - learning rate
    Solves for linear regression weights with gradient descent.

    Return
        `w` - weights after `niter` iterations
        `idx_res` - the indices of iterations where we compute the cost
        `err_res` - the cost at iterations indicated by idx_res
    '''
    N, D = np.shape(X)
    # initialize all the weights to zeros
    w = np.zeros([D])
    idx_res = []
    err_res = []
    for k in range(niter):
        #compute the gradient
        dw = gradfn(w, X, y)
        #gradient descent
        w = w - eta * dw
        # we report the progress every print_every iterations
        if k % print_every == print_every - 1:
            t_cost = cost(w, X, y)
            print('error after %d iteration: %s' % (k, t_cost))
            idx_res.append(k)
            err_res.append(t_cost)
    return w, idx_res, err_res

Minibatch Gradient Descent

def solve_via_minibatch(X, y, print_every=100,
                               niter=5000, eta=1, batch_size=50):
    '''
    Solves for linear regression weights with nesterov momentum.
    Given `X` - matrix of shape (N,D) of input features
          `y` - target y values
          `print_every` - we report performance every 'print_every' iterations
          `niter` - the number of iterates allowed
          `eta` - learning rate
          `batch_size` - the size of minibatch
    Return
        `w` - weights after `niter` iterations
        `idx_res` - the indices of iterations where we compute the cost
        `err_res` - the cost at iterations
    '''
    N, D = np.shape(X)
    # initialize all the weights to zeros
    w = np.zeros([D])
    idx_res = []
    err_res = []
    tset = list(range(N))
    for k in range(niter):
        # TODO: Insert your code to update w by minibatch gradient descent
        idx = random.sample(tset, batch_size)
        #sample batch of data
        sample_X = X[idx, :]
        sample_y = y[idx]
        dw = gradfn(w, sample_X, sample_y)
        w = w - eta * dw
        if k % print_every == print_every - 1:
            t_cost = cost(w, X, y)
            print('error after %d iteration: %s' % (k, t_cost))
            idx_res.append(k)
            err_res.append(t_cost)
    return w, idx_res, err_res

Comparison between Minibatch Gradient Descent and Gradient Descent