Machine LearningLinear Regression

Model Representation

    • 1. Problem description
    • 2. Instructions
    • 3. Data plotting
    • 4. Model function
    • 5. Forecast
    • Summarize
    • appendix

1. Problem description

A 1,000-square-foot (sqft) home sells for $300,000 and a 2,000-square-foot home goes for $500,000. These two points will form our data or training set. The unit of area is 1000 square feet and the unit of price is $1000.

Size (1000 sqft) Price (1000s of dollars)
1.0 300
2.0 500

We want to fit a linear regression model through these two points so that we can predict the prices of other houses. For example, what is the price of a house with a size of 1200 square feet.

First import the required libraries

import numpy as np
import matplotlib.pyplot as plt
plt.style.use('./deeplearning.mplstyle')

The following code to create x_train and y_train variables. Data is stored in one-dimensional NumPy arrays.

# x_train is the input variable (size in 1000 square feet)
# y_train is the target (price in 1000s of dollars)
x_train = np.array([1.0, 2.0])
y_train = np.array([300.0, 500.0])
print(f"x_train = {<!-- -->x_train}")
print(f"y_train = {<!-- -->y_train}")

2. Instructions

Use m to denote the number of training samples. (x

(

i

)

^{(i)}

(i), y

(

i

)

^{(i)}

(i)) denotes the ith training sample. Since Python is zero-indexed, (x

(

0

)

^{(0)}

(0), y

(

0

)

^{(0)}

(0)) is (1.0, 300.0) , (x

(

1

)

^{(1)}

(1), y

(

1

)

^{(1)}

(1)) is (2.0, 500.0).

3. Data plot

The two points are plotted using the scatter() function from the matplotlib library. Among them, the function parameters marker and c display the point as a red cross (the default is a blue point). Use other functions in the matplotlib library to set the title and labels to be displayed.

# Plot the data points
plt.scatter(x_train, y_train, marker='x', c='r')
# Set the title
plt. title("Housing Prices")
# Set the y-axis label
plt.ylabel('Price (in 1000s of dollars)')
# Set the x-axis label
plt. xlabel('Size (1000 sqft)')
plt. show()

4. Model function

The model function for linear regression (which is a function that maps from x to y ) can be expressed as

f

w

,

b

(

x

(

i

)

)

=

w

x

(

i

)

+

b

(1)

f_{w,b}(x^{(i)}) = wx^{(i)} + b \tag{1}

fw,b?(x(i))=wx(i) + b(1)

calculate

f

w

,

b

(

x

(

i

)

)

f_{w,b}(x^{(i)})

The value of fw,b?(x(i)) can be explicitly written for each data point as:

for

x

(

0

)

x^{(0)}

x(0), f_wb = w * x[0] + b
for

x

(

1

)

x^{(1)}

x(1), f_wb = w * x[1] + b

For large numbers of data points this can become unwieldy and repetitive. Therefore, the output can be computed in a for loop, as shown in the function compute_model_output below.

def compute_model_output(x, w, b):
    """
    Computes the prediction of a linear model
    Args:
      x (ndarray (m,)): Data, m examples
      w,b (scalar) : model parameters
    returns
      y (ndarray (m,)): target values
    """
    m = x. shape[0]
    f_wb = np.zeros(m)
    for i in range(m):
        f_wb[i] = w * x[i] + b
        
    return f_wb

Call the compute_model_output function and plot the output

w = 100
b = 100

tmp_f_wb = compute_model_output(x_train, w, b,)

# Plot our model prediction
plt.plot(x_train, tmp_f_wb, c='b', label='Our Prediction')

# Plot the data points
plt.scatter(x_train, y_train, marker='x', c='r', label='Actual Values')

# Set the title
plt. title("Housing Prices")
# Set the y-axis label
plt.ylabel('Price (in 1000s of dollars)')
# Set the x-axis label
plt. xlabel('Size (1000 sqft)')
plt. legend()
plt. show()


It is clear,

w

=

100

w = 100

w=100 and

b

=

100

b = 100

b=100 does not produce a straight line that fits the data.

According to the mathematical knowledge learned, it is easy to find

w

=

200

w = 200

w=200 and

b

=

100

b = 100

b=100

5. Forecast

Now that we have a model, we can use it to make predictions about house prices. Let’s predict the price of a 1200 square foot house. Since the unit of area is 1000 square feet, the

x

x

x is 1.2.

w = 200
b = 100
x_i = 1.2
cost_1200sqft = w * x_i + b

print(f"${<!-- -->cost_1200sqft:.0f} thousand dollars")

The output result is: $340 thousand dollars

Summary

  • Linear regression builds a model of the relationship between features and targets
    • In the example above, the feature is the house size and the target is the house price.
    • For simple linear regression, the model has two parameters

      w

      w

      w and

      b

      b

      b , whose values are fitted using the training data.

    • Once the parameters of the model are determined, the model can be used to make predictions on new data.

Appendix

deeplearning.mplstyle source code:

# see https://matplotlib.org/stable/tutorials/introductory/customizing.html
lines.linewidth: 4
lines.solid_capstyle: butt

legend. fancybox: true

# Verdana" for non-math text,
#Cambria Math

#Blue (Crayon-Aqua) 0096FF
#Dark Red C00000
#Orange (Apple Orange) FF9300
#Black000000
#Magenta FF40FF
#Purple 7030A0

axes.prop_cycle: cycler('color', ['0096FF', 'FF9300', 'FF40FF', '7030A0', 'C00000'])
#axes. facecolor: f0f0f0 # gray
axes.facecolor: ffffff # white
axes.labelsize: large
axes. axisbelow: true
axes.grid: False
axes.edgecolor: f0f0f0
axes.linewidth: 3.0
axes.titlesize: x-large

patch.edgecolor: f0f0f0
patch.linewidth: 0.5

svg. fonttype: path

grid.linestyle:-
grid.linewidth: 1.0
grid.color: cbcbcb

xtick.major.size: 0
xtick.minor.size: 0
ytick.major.size: 0
ytick.minor.size: 0

savefig.edgecolor: f0f0f0
savefig.facecolor: f0f0f0

#figure.subplot.left: 0.08
#figure.subplot.right: 0.95
#figure.subplot.bottom: 0.07

#figure.facecolor: f0f0f0 # gray
figure.facecolor: ffffff # white

## ************************************************* ***************************
## *FONT*
## ************************************************* ***************************
## The font properties used by `text.Text`.
## See https://matplotlib.org/api/font_manager_api.html for more information
## on font properties. The 6 font properties used for font matching are
## given below with their default values.
##
## The font.family property can take either a concrete font name (not supported
## when rendering text with usetex), or one of the following five generic
## values:
## - 'serif' (e.g., Times),
## - 'sans-serif' (e.g., Helvetica),
## - 'cursive' (e.g., Zapf-Chancery),
## - 'fantasy' (e.g., Western), and
## - 'monospace' (e.g., Courier).
## Each of these values has a corresponding default list of font names
## (font.serif, etc.); the first available font in the list is used. Note that
## for font.serif, font.sans-serif, and font.monospace, the first element of
## the list (a DejaVu font) will always be used because DejaVu is shipped with
## Matplotlib and is thus guaranteed to be available; the other entries are
## left as examples of other possible values.
##
## The font.style property has three values: normal (or roman), italic
## or oblique. The oblique style will be used for italic, if it is not
## present.
##
## The font. variant property has two values: normal or small-caps. For
## TrueType fonts, which are scalable fonts, small-caps is equivalent
## to use a font size of 'smaller', or about 83%% of the current font
## size.
##
## The font.weight property has effectively 13 values: normal, bold,
## bolder, lighter, 100, 200, 300, ..., 900. Normal is the same as
## 400, and bold is 700. bolder and lighter are relative values with
## respect to the current weight.
##
## The font.stretch property has 11 values: ultra-condensed,
## extra-condensed, condensed, semi-condensed, normal, semi-expanded,
## expanded, extra-expanded, ultra-expanded, wider, and narrower. This
## property is not currently implemented.
##
## The font.size property is the default font size for text, given in points.
## 10 pt is the standard value.
##
## Note that font. size controls default text sizes. To configure
## special text sizes tick labels, axes, labels, title, etc., see the rc
## settings for axes and ticks. Special text sizes can be defined
## relative to font.size, using the following values: xx-small, x-small,
## small, medium, large, x-large, xx-large, larger, or smaller


font.family: sans-serif
font.style: normal
font.variant: normal
font.weight: normal
font.stretch: normal
font.size: 8.0

font.serif: DejaVu Serif, Bitstream Vera Serif, Computer Modern Roman, New Century Schoolbook, Century Schoolbook L, Utopia, ITC Bookman, Bookman, Nimbus Roman No9 L, Times New Roman, Times, Palatino, Charter, serif
font.sans-serif: Verdana, DejaVu Sans, Bitstream Vera Sans, Computer Modern Sans Serif, Lucida Grande, Geneva, Lucid, Arial, Helvetica, Avant Garde, sans-serif
font.cursive: Apple Chancery, Textile, Zapf Chancery, Sand, Script MT, Felipa, Comic Neue, Comic Sans MS, cursive
font.fantasy: Chicago, Charcoal, Impact, Western, Humor Sans, xkcd, fantasy
font.monospace: DejaVu Sans Mono, Bitstream Vera Sans Mono, Computer Modern Typewriter, Andale Mono, Nimbus Mono L, Courier New, Courier, Fixed, Terminal, monospace


## ************************************************* ***************************
## * TEXT *
## ************************************************* ***************************
## The text properties used by `text.Text`.
## See https://matplotlib.org/api/artist_api.html#module-matplotlib.text
## for more information on text properties
#text.color: black