Linear regression predicts Boston housing prices & the reason for loss is NAN & draws a scatter plot to find the relationship between features and labels

Boston house price csv file

Link: https://pan.baidu.com/s/1uz6oKs7IeEzHdJkfrpiayg?pwd=vufb Extraction code: vufb

Code

%matplotlib inline
import random
import torch
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import torch

Get the data set from CSV

# Load data, the first line is a useless line, skip it directly
boston = pd.read_csv('../data/boston_house_prices.csv',skiprows=[0])
# There are 14 columns in total, the first thirteen columns are features, and the last column is price
boston

Take the last column and set it to labels, and all the previous columns to features

# The last column is used as labels, and the contents of the first thirteen columns are used as features.
# Directly let the last column go off the stack, leaving Boston with the first 13 columns.
labels = boston.pop('MEDV')
features=boston

Draw a scatter plot to see the relationship between features and house prices. If it is a linear relationship, it means that there is a certain correlation between the feature and the label. Select features related to labels as the final features

# Look at the scatter plot of each feature and house price
data_xTitle = ['CRIM','ZN','INDUS','CHAS','NOX','RM','AGE','DIS','RAD','TAX','PTRATIO','B' , 'LSTAT']
# Set 5 rows, 3 columns = 15 subgraphs
fig, a = plt.subplots(5, 3)
m = 0
for i in range(0, 5):
    if i == 4:
        a[i][0].scatter(features[str(data_xTitle[m])], labels, s=30, edgecolor='white')
        a[i][0].set_title(str(data_xTitle[m]))
    else:
        for j in range(0, 3):
            a[i][j].scatter(features[str(data_xTitle[m])], labels, s=30, edgecolor='white')
            a[i][j].set_title(str(data_xTitle[m]))
            m = m + 1
plt.show()
# It can be seen from the figure below that CRIM, RM, LSTAT and y have a linear relationship, so these three features are selected as eigenvalues.

# CRIM, RM, LSTAT have a linear relationship with y, so these three features are selected as eigenvalues.
features = features[['LSTAT','CRIM','RM']]

Convert data format to tensor

features = torch.tensor(np.array(features)).to(torch.float32)
labels = torch.tensor(np.array(labels)).to(torch.float32)
features.shape, labels.shape

(torch.Size([506, 13]), torch.Size([506]))

Define linear regression, loss function, optimization function

# Develop linear regression model
def linreg(X,w,b):
    return torch.matmul(X,w) + b
    
# Define loss function
def squared_loss(y_hat,y):
    return (y_hat - y.reshape(y_hat.shape)) **2 /2
    
#Define optimization function
def sgd(params,lr,batch_size):
    '''mini-batch stochastic gradient descent'''
    with torch.no_grad():
        for param in params:
            param -= lr * param.grad/batch_size
            param.grad.zero_()

data_iter function, fetch data by batch

def data_iter(batch_size, features, labels):
    num_examples = len(features)
    indices = list(range(num_examples))
    # These samples are read randomly, in no specific order
    random.shuffle(indices)
    for i in range(0, num_examples, batch_size):
        batch_indices = torch.tensor(indices[i: min(i + batch_size, num_examples)])
        yield features[batch_indices], labels[batch_indices]

Set parameters

w = torch.normal(0, 0.01, size=(features.shape[1],1), requires_grad=True)
b = torch.zeros(1, requires_grad=True)
lr = 0.03
# lr = 0.0001
num_epochs = 100
net=linreg
loss = squared_loss
batch_size = 10

The shapes of w and b are:
torch.Size([3, 1])
torch.Size([1])

Start training

for epoch in range(num_epochs):
    for X, y in data_iter(batch_size, features, labels):
        l = loss(net(X, w, b), y)
        # Mini-batch loss for X and y
        # Because the shape of l is (batch_size,1), not a scalar. All elements in l are added together,
        # And use this to calculate the gradient about [w,b]
        l.sum().backward()
        sgd([w, b], lr, batch_size)
    # Update parameters using their gradients
    with torch.no_grad():
        train_l = loss(net(features, w, b), labels)
        print(f'epoch {<!-- -->epoch + 1}, loss {<!-- -->float(train_l.mean()):f}')

When the learning rate of the model is set to 0.03, the loss directly becomes NAN

epoch 1, loss nan
epoch 2, lost nan
epoch 3, lost nan
epoch 4, lost nan
epoch 5, lost nan
epoch 6, lost nan
epoch 7, lost nan
epoch 8, lost nan
epoch 9, lost nan
epoch 10, lost nan
epoch 11, lost nan
epoch 12, lost nan
epoch 13, lost nan
epoch 14, lost nan
epoch 15, lost nan
epoch 16, lost nan
epoch 17, lost nan
epoch 18, lost nan
epoch 19, lost nan
epoch 20, lost nan
epoch 21, lost nan
epoch 22, lost nan
epoch 23, lost nan
epoch 24, lost nan
epoch 25, lost nan
epoch 26, lost nan
epoch 27, lost nan
epoch 28, lost nan
epoch 29, lost nan
epoch 30, lost nan
epoch 31, lost nan
epoch 32, lost nan
epoch 33, lost nan
epoch 34, lost nan
epoch 35, lost nan
epoch 36, lost nan
epoch 37, lost nan
epoch 38, lost nan
epoch 39, lost nan
epoch 40, lost nan
epoch 41, lost nan
epoch 42, lost nan
epoch 43, lost nan
epoch 44, lost nan
epoch 45, lost nan
epoch 46, lost nan
epoch 47, lost nan
epoch 48, lost nan
epoch 49, lost nan
epoch 50, loss nan
epoch 51, lost nan
epoch 52, lost nan
epoch 53, lost nan
epoch 54, lost nan
epoch 55, lost nan
epoch 56, lost nan
epoch 57, lost nan
epoch 58, lost nan
epoch 59, lost nan
epoch 60, lost nan
epoch 61, lost nan
epoch 62, lost nan
epoch 63, lost nan
epoch 64, lost nan
epoch 65, lost nan
epoch 66, lost nan
epoch 67, lost nan
epoch 68, lost nan
epoch 69, lost nan
epoch 70, lost nan
epoch 71, lost nan
epoch 72, lost nan
epoch 73, lost nan
epoch 74, lost nan
epoch 75, lost nan
epoch 76, lost nan
epoch 77, lost nan
epoch 78, lost nan
epoch 79, lost nan
epoch 80, lost nan
epoch 81, lost nan
epoch 82, lost nan
epoch 83, lost nan
epoch 84, lost nan
epoch 85, lost nan
epoch 86, lost nan
epoch 87, lost nan
epoch 88, lost nan
epoch 89, lost nan
epoch 90, lost nan
epoch 91, lost nan
epoch 92, lost nan
epoch 93, lost nan
epoch 94, lost nan
epoch 95, lost nan
epoch 96, lost nan
epoch 97, lost nan
epoch 98, lost nan
epoch 99, lost nan
epoch 100, loss nan

When the learning rate of the model is set to 0.0001, the loss is normal and the model begins to converge

epoch 1, loss 141.555878
Epoch 2, loss 115.449852
Epoch 3, loss 101.026237
Epoch 4, loss 90.287994
Epoch 5, loss 81.646828
Epoch 6, loss 74.384491
Epoch 7, loss 68.148872
Epoch 8, loss 62.699074
Epoch 9, loss 57.872326
Epoch 10, loss 53.601421
Epoch 11, loss 49.778000
Epoch 12, loss 46.333401
Epoch 13, loss 43.253365
Epoch 14, loss 40.471313
Epoch 15, loss 37.963455
Epoch 16, loss 35.711601
Epoch 17, loss 33.679176
Epoch 18, loss 31.841145
Epoch 19, loss 30.203505
Epoch 20, loss 28.699686
Epoch 21, loss 27.352037
Epoch 22, loss 26.142868
Epoch 23, loss 25.045834
Epoch 24, loss 24.059885
Epoch 25, loss 23.171280
Epoch 26, loss 22.369287
Epoch 27, loss 21.646309
Epoch 28, loss 20.998608
Epoch 29, loss 20.407761
Epoch 30, loss 19.874365
Epoch 31, loss 19.396839
Epoch 32, loss 18.967056
Epoch 33, loss 18.576946
Epoch 34, loss 18.234808
Epoch 35, loss 17.904724
Epoch 36, loss 17.623093
Epoch 37, loss 17.360590
Epoch 38, loss 17.126835
Epoch 39, loss 16.916040
Epoch 40, loss 16.727121
Epoch 41, loss 16.555841
Epoch 42, loss 16.401901
Epoch 43, loss 16.264545
Epoch 44, loss 16.145824
Epoch 45, loss 16.026453
Epoch 46, loss 15.927325
Epoch 47, loss 15.830773
Epoch 48, loss 15.748351
Epoch 49, loss 15.672281
Epoch 50, loss 15.606522
Epoch 51, loss 15.546185
Epoch 52, loss 15.490641
Epoch 53, loss 15.458157
Epoch 54, loss 15.395338
Epoch 55, loss 15.359412
Epoch 56, loss 15.331330
Epoch 57, loss 15.284848
Epoch 58, loss 15.264071
Epoch 59, loss 15.238921
Epoch 60, loss 15.206428
Epoch 61, loss 15.184341
Epoch 62, loss 15.190187
Epoch 63, loss 15.144171
Epoch 64, loss 15.127305
Epoch 65, loss 15.115336
Epoch 66, loss 15.111353
Epoch 67, loss 15.098548
Epoch 68, loss 15.077714
Epoch 69, loss 15.075640
Epoch 70, loss 15.072990
Epoch 71, loss 15.051690
Epoch 72, loss 15.046121
Epoch 73, loss 15.038815
Epoch 74, loss 15.038069
Epoch 75, loss 15.027984
Epoch 76, loss 15.028069
Epoch 77, loss 15.030132
Epoch 78, loss 15.015227
Epoch 79, loss 15.014658
Epoch 80, loss 15.010786
Epoch 81, loss 15.005883
Epoch 82, loss 15.007875
Epoch 83, loss 15.003115
Epoch 84, loss 15.015619
Epoch 85, loss 14.996306
Epoch 86, loss 15.008889
Epoch 87, loss 14.993307
Epoch 88, loss 14.997282
Epoch 89, loss 14.990996
Epoch 90, loss 14.991257
Epoch 91, loss 14.997286
Epoch 92, loss 14.989521
Epoch 93, loss 14.987417
Epoch 94, loss 14.989147
Epoch 95, loss 14.989621
Epoch 96, loss 14.984948
Epoch 97, loss 14.984961
Epoch 98, loss 14.984855
Epoch 99, loss 14.983346
Epoch 100, loss 14.999675

Supplement

Why is the loss NAN when the learning rate is 0.03?

This shows that the step is too big for the loss function of the model, and the optimal point is passed directly. Reduce the learning rate, and as the epoch increases, the loss decreases and the model converges.