Model Representation
-
- 1. Problem description
- 2. Instructions
- 3. Data plotting
- 4. Model function
- 5. Forecast
- Summarize
- appendix
1. Problem description
A 1,000-square-foot (sqft) home sells for $300,000 and a 2,000-square-foot home goes for $500,000. These two points will form our data or training set. The unit of area is 1000 square feet and the unit of price is $1000.
Size (1000 sqft) | Price (1000s of dollars) |
---|---|
1.0 | 300 |
2.0 | 500 |
We want to fit a linear regression model through these two points so that we can predict the prices of other houses. For example, what is the price of a house with a size of 1200 square feet.
First import the required libraries
import numpy as np import matplotlib.pyplot as plt plt.style.use('./deeplearning.mplstyle')
The following code to create x_train and y_train variables. Data is stored in one-dimensional NumPy arrays.
# x_train is the input variable (size in 1000 square feet) # y_train is the target (price in 1000s of dollars) x_train = np.array([1.0, 2.0]) y_train = np.array([300.0, 500.0]) print(f"x_train = {<!-- -->x_train}") print(f"y_train = {<!-- -->y_train}")
2. Instructions
Use m to denote the number of training samples. (x
(
i
)
^{(i)}
(i), y
(
i
)
^{(i)}
(i)) denotes the ith training sample. Since Python is zero-indexed, (x
(
0
)
^{(0)}
(0), y
(
0
)
^{(0)}
(0)) is (1.0, 300.0) , (x
(
1
)
^{(1)}
(1), y
(
1
)
^{(1)}
(1)) is (2.0, 500.0).
3. Data plot
The two points are plotted using the scatter()
function from the matplotlib
library. Among them, the function parameters marker
and c
display the point as a red cross (the default is a blue point). Use other functions in the matplotlib
library to set the title and labels to be displayed.
# Plot the data points plt.scatter(x_train, y_train, marker='x', c='r') # Set the title plt. title("Housing Prices") # Set the y-axis label plt.ylabel('Price (in 1000s of dollars)') # Set the x-axis label plt. xlabel('Size (1000 sqft)') plt. show()
4. Model function
The model function for linear regression (which is a function that maps from x
to y
) can be expressed as
f
w
,
b
(
x
(
i
)
)
=
w
x
(
i
)
+
b
(1)
f_{w,b}(x^{(i)}) = wx^{(i)} + b \tag{1}
fw,b?(x(i))=wx(i) + b(1)
calculate
f
w
,
b
(
x
(
i
)
)
f_{w,b}(x^{(i)})
The value of fw,b?(x(i)) can be explicitly written for each data point as:
for
x
(
0
)
x^{(0)}
x(0), f_wb = w * x[0] + b
for
x
(
1
)
x^{(1)}
x(1), f_wb = w * x[1] + b
For large numbers of data points this can become unwieldy and repetitive. Therefore, the output can be computed in a for
loop, as shown in the function compute_model_output
below.
def compute_model_output(x, w, b): """ Computes the prediction of a linear model Args: x (ndarray (m,)): Data, m examples w,b (scalar) : model parameters returns y (ndarray (m,)): target values """ m = x. shape[0] f_wb = np.zeros(m) for i in range(m): f_wb[i] = w * x[i] + b return f_wb
Call the compute_model_output
function and plot the output
w = 100 b = 100 tmp_f_wb = compute_model_output(x_train, w, b,) # Plot our model prediction plt.plot(x_train, tmp_f_wb, c='b', label='Our Prediction') # Plot the data points plt.scatter(x_train, y_train, marker='x', c='r', label='Actual Values') # Set the title plt. title("Housing Prices") # Set the y-axis label plt.ylabel('Price (in 1000s of dollars)') # Set the x-axis label plt. xlabel('Size (1000 sqft)') plt. legend() plt. show()
It is clear,
w
=
100
w = 100
w=100 and
b
=
100
b = 100
b=100 does not produce a straight line that fits the data.
According to the mathematical knowledge learned, it is easy to find
w
=
200
w = 200
w=200 and
b
=
100
b = 100
b=100
5. Forecast
Now that we have a model, we can use it to make predictions about house prices. Let’s predict the price of a 1200 square foot house. Since the unit of area is 1000 square feet, the
x
x
x is 1.2.
w = 200 b = 100 x_i = 1.2 cost_1200sqft = w * x_i + b print(f"${<!-- -->cost_1200sqft:.0f} thousand dollars")
The output result is: $340 thousand dollars
Summary
- Linear regression builds a model of the relationship between features and targets
- In the example above, the feature is the house size and the target is the house price.
- For simple linear regression, the model has two parameters
w
w
w and
b
b
b , whose values are fitted using the training data.
- Once the parameters of the model are determined, the model can be used to make predictions on new data.
Appendix
deeplearning.mplstyle source code:
# see https://matplotlib.org/stable/tutorials/introductory/customizing.html lines.linewidth: 4 lines.solid_capstyle: butt legend. fancybox: true # Verdana" for non-math text, #Cambria Math #Blue (Crayon-Aqua) 0096FF #Dark Red C00000 #Orange (Apple Orange) FF9300 #Black000000 #Magenta FF40FF #Purple 7030A0 axes.prop_cycle: cycler('color', ['0096FF', 'FF9300', 'FF40FF', '7030A0', 'C00000']) #axes. facecolor: f0f0f0 # gray axes.facecolor: ffffff # white axes.labelsize: large axes. axisbelow: true axes.grid: False axes.edgecolor: f0f0f0 axes.linewidth: 3.0 axes.titlesize: x-large patch.edgecolor: f0f0f0 patch.linewidth: 0.5 svg. fonttype: path grid.linestyle:- grid.linewidth: 1.0 grid.color: cbcbcb xtick.major.size: 0 xtick.minor.size: 0 ytick.major.size: 0 ytick.minor.size: 0 savefig.edgecolor: f0f0f0 savefig.facecolor: f0f0f0 #figure.subplot.left: 0.08 #figure.subplot.right: 0.95 #figure.subplot.bottom: 0.07 #figure.facecolor: f0f0f0 # gray figure.facecolor: ffffff # white ## ************************************************* *************************** ## *FONT* ## ************************************************* *************************** ## The font properties used by `text.Text`. ## See https://matplotlib.org/api/font_manager_api.html for more information ## on font properties. The 6 font properties used for font matching are ## given below with their default values. ## ## The font.family property can take either a concrete font name (not supported ## when rendering text with usetex), or one of the following five generic ## values: ## - 'serif' (e.g., Times), ## - 'sans-serif' (e.g., Helvetica), ## - 'cursive' (e.g., Zapf-Chancery), ## - 'fantasy' (e.g., Western), and ## - 'monospace' (e.g., Courier). ## Each of these values has a corresponding default list of font names ## (font.serif, etc.); the first available font in the list is used. Note that ## for font.serif, font.sans-serif, and font.monospace, the first element of ## the list (a DejaVu font) will always be used because DejaVu is shipped with ## Matplotlib and is thus guaranteed to be available; the other entries are ## left as examples of other possible values. ## ## The font.style property has three values: normal (or roman), italic ## or oblique. The oblique style will be used for italic, if it is not ## present. ## ## The font. variant property has two values: normal or small-caps. For ## TrueType fonts, which are scalable fonts, small-caps is equivalent ## to use a font size of 'smaller', or about 83%% of the current font ## size. ## ## The font.weight property has effectively 13 values: normal, bold, ## bolder, lighter, 100, 200, 300, ..., 900. Normal is the same as ## 400, and bold is 700. bolder and lighter are relative values with ## respect to the current weight. ## ## The font.stretch property has 11 values: ultra-condensed, ## extra-condensed, condensed, semi-condensed, normal, semi-expanded, ## expanded, extra-expanded, ultra-expanded, wider, and narrower. This ## property is not currently implemented. ## ## The font.size property is the default font size for text, given in points. ## 10 pt is the standard value. ## ## Note that font. size controls default text sizes. To configure ## special text sizes tick labels, axes, labels, title, etc., see the rc ## settings for axes and ticks. Special text sizes can be defined ## relative to font.size, using the following values: xx-small, x-small, ## small, medium, large, x-large, xx-large, larger, or smaller font.family: sans-serif font.style: normal font.variant: normal font.weight: normal font.stretch: normal font.size: 8.0 font.serif: DejaVu Serif, Bitstream Vera Serif, Computer Modern Roman, New Century Schoolbook, Century Schoolbook L, Utopia, ITC Bookman, Bookman, Nimbus Roman No9 L, Times New Roman, Times, Palatino, Charter, serif font.sans-serif: Verdana, DejaVu Sans, Bitstream Vera Sans, Computer Modern Sans Serif, Lucida Grande, Geneva, Lucid, Arial, Helvetica, Avant Garde, sans-serif font.cursive: Apple Chancery, Textile, Zapf Chancery, Sand, Script MT, Felipa, Comic Neue, Comic Sans MS, cursive font.fantasy: Chicago, Charcoal, Impact, Western, Humor Sans, xkcd, fantasy font.monospace: DejaVu Sans Mono, Bitstream Vera Sans Mono, Computer Modern Typewriter, Andale Mono, Nimbus Mono L, Courier New, Courier, Fixed, Terminal, monospace ## ************************************************* *************************** ## * TEXT * ## ************************************************* *************************** ## The text properties used by `text.Text`. ## See https://matplotlib.org/api/artist_api.html#module-matplotlib.text ## for more information on text properties #text.color: black