week2_lab2 Gradient Descent for Linear Regression (Boston Housing dataset) (study notes)
In this exercise, you will learn the following
- implement the gradient descent method
- implement the minibatch gradient descent method
We will use the Boston Housing data, similar to Week 1. We can import the dataset and preprocess it as follows. Note we add a feature of to x_input to get a n x (d + 1) matrix x_in
import pandas as pd import numpy as np data_url = "http://lib.stat.cmu.edu/datasets/boston" raw_df = pd.read_csv(data_url, sep="\s + ", skiprows=22, header=None) boston_data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]]) target = raw_df.values[1::2, 2] data = boston_data; x_input = data # a data matrix y_target = target; # a vector for all outputs # add a feature 1 to the dataset, then we do not need to consider the bias and weight separately x_in = np.concatenate([np.ones([np.shape(x_input)[0], 1]), x_input], axis=1) # we normalize the data so that each has regularity x_in = preprocessing.normalize(x_in)
x_in = np.concatenate([np.ones([np.shape(x_input)[0], 1]), x_input], axis=1): This line is used to add an extra feature (constant 1) to the input feature Matrix x_input front, this is so that the bias term (bias) and weight term (weights) do not need to be considered separately. The specific explanation is as follows:
- np.ones([np.shape(x_input)[0], 1]) creates a matrix of shape (n, 1) , where n is the number of examples and the column is 1 for each example.
- x_input is the original input feature matrix with shape (n, d), where n is the number of examples and d is the number of features.
- np.concatenate(…, axis=1) is used to concatenate these two matrices by column (axis=1) to form a new feature matrix x_in, where the first column is 1, indicating the constant term, followed by the original features .
Linear Model & amp; Cost Function
def linearmat_2(w, X): ''' a vectorization of linearmat_1 in Week 1 lab. Input: w is a weight parameter (including the bias), and X is a data matrix (n x (d + 1)) (including the feature) Output: a vector containing the predictions of linear models ''' return np.dot(X, w)
def cost(w, X, y): ''' Evaluate the cost function in a vectorized manner inputs `X` and outputs `y`, at weights `w`. ''' residual = y - linearmat_2(w, X) # get the residual err = np.dot(residual, residual) / (2 * len(y)) # compute the error return err
Gradient Computation
Calculate the gradient of the cost function
# Vectorized gradient function def gradfn(weights, X, y): ''' Given `weights` - a current "Guess" of what our weights should be `X` - matrix of shape (N,d + 1) of input features including the feature $1$ `y` - target y values Return gradient of each weight evaluated at the current value ''' y_pred = np.dot(X, weights) error = y_pred - y return np.dot(X.T, error) / len(y)
Gradient Descent
Use calculated gradients for gradient descent
def solve_via_gradient_descent(X, y, print_every=100, niter=5000, eta=1): ''' Given `X` - matrix of shape (N,D) of input features `y` - target y values `print_every` - we report performance every 'print_every' iterations `niter` - the number of iterates allowed `eta` - learning rate Solves for linear regression weights with gradient descent. Return `w` - weights after `niter` iterations `idx_res` - the indices of iterations where we compute the cost `err_res` - the cost at iterations indicated by idx_res ''' N, D = np.shape(X) # initialize all the weights to zeros w = np.zeros([D]) idx_res = [] err_res = [] for k in range(niter): #compute the gradient dw = gradfn(w, X, y) #gradient descent w = w - eta * dw # we report the progress every print_every iterations if k % print_every == print_every - 1: t_cost = cost(w, X, y) print('error after %d iteration: %s' % (k, t_cost)) idx_res.append(k) err_res.append(t_cost) return w, idx_res, err_res
Minibatch Gradient Descent
def solve_via_minibatch(X, y, print_every=100, niter=5000, eta=1, batch_size=50): ''' Solves for linear regression weights with nesterov momentum. Given `X` - matrix of shape (N,D) of input features `y` - target y values `print_every` - we report performance every 'print_every' iterations `niter` - the number of iterates allowed `eta` - learning rate `batch_size` - the size of minibatch Return `w` - weights after `niter` iterations `idx_res` - the indices of iterations where we compute the cost `err_res` - the cost at iterations ''' N, D = np.shape(X) # initialize all the weights to zeros w = np.zeros([D]) idx_res = [] err_res = [] tset = list(range(N)) for k in range(niter): # TODO: Insert your code to update w by minibatch gradient descent idx = random.sample(tset, batch_size) #sample batch of data sample_X = X[idx, :] sample_y = y[idx] dw = gradfn(w, sample_X, sample_y) w = w - eta * dw if k % print_every == print_every - 1: t_cost = cost(w, X, y) print('error after %d iteration: %s' % (k, t_cost)) idx_res.append(k) err_res.append(t_cost) return w, idx_res, err_res