Operating environment: Matlab
Written by: Left Hand の Tomorrow
Featured column: “python”
Recommended column: “Algorithm Research”
#### Anti-counterfeiting watermark–Left hand の tomorrow ####
Hello everyone, I am Left Hand の Tomorrow! haven’t seen you for a long time
Open a new series today–Redefining the powerful series of matlab
Last updated: May 25, 2023, 286 original blog of Left Hand の Tomorrow
Updated in column: matlab
#### Anti-counterfeiting watermark–Left Hand の Tomorrow ####
Function Description
N = normalize(A)
Returns the z-values (centered at 0 and standard deviation of 1) of the data in A
by vector.
-
If
A
is a vector, thennormalize
operates on the entire vectorA
. -
If
A
is a matrix, thennormalize
operates on each column ofA
separately. -
If
A
is a multidimensional array, thennormalize
operates along the first dimension ofA
whose size is not equal to 1. -
If
A
is a table or timetable, thennormalize
operates on each variable ofA
separately.
N = normalize(A,dim)
specifies the dimension of A
along which to operate. For example, normalize(A,2)
normalizes each row.
N = normalize(___,method)
Use either of the syntaxes above to specify the normalization method. For example, normalize(A,'norm')
normalizes the data in A
by the Euclidean norm (2-norm).
N = normalize(___,method,methodtype)
specifies the normalization type for the given method. For example, normalize(A,'norm',Inf)
normalizes the data in A
using the infinity norm.
method
– normalization methodNormalization method, specified as one of the following options:
method
Description
'zscore'
A z-score with mean 0 and standard deviation 1.
'norm'
2-norm.
'scale'
Scaled by standard deviation.
'range'
Rescales the data range to [0,1].
'center'
Center the data to have a mean of 0.
'medianiqr'
Center and scale the data so that the median is 0 and the interquartile range is 1. To return the parameters that the function uses to normalize the data, specify the C and S output arguments.
methodtype
– method typeMethod type, specified as an array, table, two-element row vector, or type name, depending on the method specified:
method
Method Type Options
describe
'zscore'
'std'
(default)Center and scale to have a mean of 0 and a standard deviation of 1.
'robust'
Center and scale so that the median is 0 and the median absolute deviation is 1.
'norm'
positive numeric scalar (default is 2)
p-norm
Inf
Infinite norm.
'scale'
'std'
(default)Scaled by standard deviation.
'mad'
Scale by median absolute deviation.
'first'
Scale by the first element of the data.
'iqr'
Scale by interquartile range. numeric array
Scale by value. surface
Use variable scaling from table. Each table variable in the input data A
is scaled with the value of a similarly named variable in the scaling table.
'range'
two-element row vector (default [0 1])
Rescale the data range to an interval in the form of [a b]
, wherea < b
code>.
'center'
'mean'
(default)Center to have a mean of 0.
'median'
Center so that median is 0. numeric array
Translate center by value. The array must have a size compatible with input A
.surface
Translate the center using the variables in the table. Each table variable in the input data A
is centered using the value from a similarly named variable in the centering table.To return the parameters that the function uses to normalize the data, specify the C and S output arguments.
N = normalize(___,'center',centertype,'scale',scaletype?)
while using ' center'
and 'scale'
methods. Only these two methods can be used together. If no centertype
or scaletype
is specified, normalize will use the method's default method type (centered so that the mean is 0 and scaled by the standard deviation).
This syntax supports simultaneous execution of both methods with arbitrary centering and scaling types. For example, N = normalize(A,'center','median','scale','mad')
. You can also use this syntax to specify previously computed normalized centering and scaling values C and S. For example, normalize a dataset with [N1,C,S] = normalize(A1)
and save the parameters. Then, reuse these parameters for different datasets with N2 = normalize(A2,'center',C,'scale',S)
.
N = normalize(___,Name,Value)
Use one or more name-value arguments to specify additional parameters for smoothing. For example, when A
is a table or timetable, normalize(A,'DataVariables',datavars)
normalizes the variables specified by datavars
Normalized.
[N,C,S] = normalize(___)
also returns the centering and scaling values C
and S
. Then you can use C
and S
by N = normalize(A2,'center',C,'scale',S)
code> to normalize different input data.
N
- normalized valueNormalized values, returned as an array, table, or timetable.
N
is the same size asA
unlessReplaceValues
isfalse
. If the value ofReplaceValues
isfalse
, the width ofN
is the sum of the input data width and the specified number of data variables.
normalize
normally operates on all variables of input tables and timetables, with the following exceptions:
If
DataVariables
is specified,normalize
operates only on the specified variables.If you use the syntax
normalize(T,'center',C,'scale',S)
to use previously calculated parametersC
andS
to normalize a table or timetableT
, thennormalize
will automatically useC
andS
to determine the data variable inT
on which to operate.
C
- centralization valueCentering values, returned as an array or table.
When
A
is an array,normalize
returnsC
andS
as an array, satisfyingN = (A - C) ./S
. Each value inC
is a centering value used to perform normalization on the specified dimension. For example, ifA
is a 10×10 data matrix andnormalize
operates on the first dimension, thenC
is a 1×10 vector , which contains the centered values for each column inA
.When
A
is a table or timetable,normalize
returnsC
andS
as a table with normalization A table of centered and scaled values for each table variable that is normalized, ieN.Var = (A.Var - C.Var) ./ S.Var
. The table variable names forC
andS
match the corresponding table variables in the input. Each variable inC
contains the centered value used to normalize the similarly named variable inA
.
S
- scaling valueScale values, returned as an array or table.
When
A
is an array,normalize
returnsC
andS
as an array, satisfyingN = (A - C) ./S
. Each value inS
is a scaling value used to perform normalization on the specified dimension. For example, ifA
is a 10×10 data matrix andnormalize
operates on the first dimension, thenS
is a 1×10 vector , which contains the scaled values for each column inA
.When
A
is a table or timetable,normalize
returnsC
andS
as a table with normalization A table of centered and scaled values for each table variable that is normalized, ieN.Var = (A.Var - C.Var) ./ S.Var
. The table variable names forC
andS
match the corresponding table variables in the input. Each variable inS
contains scaled values used to normalize the similarly named variable inA
.
Z value
The z-score measures the distance of the data points from the mean in units of standard deviation. The normalized dataset has a mean of 0, a standard deviation of 1, and preserves the shape properties of the original dataset (same skewness and kurtosis).
For a random variable X with mean μ and standard deviation σ, the z-value for a value x is z=(x?μ)/σ. For sampled data with mean  ̄ ̄X and standard deviation S, the z-value for data point x is z=(x? ̄ ̄X)/S.
P-norm
The general definition of the p-norm of a vector v with N elements is
, where p is any positive real value,
Inf
or-Inf
. Some common values for p are 1, 2, andInf
.
If p is 1, the resulting 1-norm is the sum of the absolute values of the vector elements.
If p is 2, the resulting 2-norm is the magnitude or Euclidean length of the vector.
If p is
Inf
, then ‖v‖∞=maxi(∣v(i)∣).Rescale
Rescaling changes the distance between the minimum and maximum values in a data set by stretching or compressing points along the number line. The z-scores of the data are preserved, so the shape of the distribution remains the same.
The equation to rescale the data
X
to an arbitrary interval[a b]
isWhile both the
normalize
and rescale functions can rescale data to an arbitrary interval,rescale
also allows input data to be clipped to specified minimum and maximum values.Interquartile range
The interquartile range (IQR) of a data set describes the range of values in the middle 50% when the values are ordered. If the median of the data is Q2, the median of the lower half of the data is Q1, and the median of the upper half of the data is Q3, then IQR = Q3 - Q1.
When the data contains outliers (very large or very small values), IQR is often better than looking at the full range of the data because IQR excludes the largest 25% and smallest 25% of values in the data.
Median absolute deviation
The median absolute deviation (MAD) of a dataset is the median value of absolute deviations from the median of the data ?X: MAD=median(∣∣xX∣∣). Therefore, MAD accounts for the variability of the data relative to the median.
When the data contain outliers (very large or very small values), MAD is often preferable to using the standard deviation of the data, because the standard deviation squares the deviation from the mean, making the influence of the outliers disproportionate. Conversely, deviations from a small number of outliers do not affect the value of MAD.
Example
Vector and matrix data
Normalize data in vectors and matrices by computing Z-scores.
Normalize the data to have a mean of 0 and a standard deviation of 1 by creating a vector v
and computing the Z-score.
v = 1:5; N = normalize(v)
N = <em>1×5</em> -1.2649 -0.6325 0 0.6325 1.2649
Create a matrix B
and calculate the Z-scores for each column. Then, normalize each row.
B = magic(3)
B = <em>3×3</em> 8 1 6 3 5 7 4 9 2
N1 = normalize(B)
N1 = <em>3×3</em> 1.1339 -1.0000 0.3780 -0.7559 0 0.7559 -0.3780 1.0000 -1.1339
N2 = normalize(B,2)
N2 = <em>3×3</em> 0.8321-1.1094 0.2774 -1.0000 0 1.0000 -0.2774 1.1094 -0.8321
Zoom data
Scale the vector A
by its standard deviation.
A = 1:5; Ns = normalize(A,'scale')
Ns = <em>1×5</em> 0.6325 1.2649 1.8974 2.5298 3.1623
Scales A
to be in the range [0,1].
Nr = normalize(A,'range')
Nr = <em>1×5</em> 0 0.2500 0.5000 0.7500 1.0000
Specify method type
Create the vector A
and normalize it by its 1-norm.
A = 1:5; Np = normalize(A,'norm',1)
Np = <em>1×5</em> 0.0667 0.1333 0.2000 0.2667 0.3333
Center the data in A
so that its mean is 0.
Nc = normalize(A,'center','mean')
Nc = <em>1×5</em> -2 -1 0 1 2
table variables
Create a table that contains height information for five people.
LastName = {'Sanchez';'Johnson';'Lee';'Diaz';'Brown'}; Height = [71;69;64;67;64]; T = table(LastName,Height)
T=<em>5×2 table</em> LastName Height _______________ 'Sanchez' 71 'Johnson' 69 'Lee' 64 'Diaz' 67 'Brown' 64
Normalize the height data by the maximum height.
N = normalize(T,'norm',Inf,'DataVariables','Height')
N=<em>5×2 table</em> LastName Height _________ _______ 'Sanchez' 1 'Johnson' 0.97183 'Lee' 0.90141 'Diaz' 0.94366 'Brown' 0.90141
Normalize multiple datasets with the same parameters
Normalizes a dataset, returns computed parameter values, and reuses these parameters to apply the same normalization to another dataset.
Create a timetable with two variables Temperature
and WindSpeed
. A second timetable is then created with the same variables, but using samples collected one year later.
rng default Time1 = (datetime(2019,1,1):days(1):datetime(2019,1,10))'; Temperature = randi([10 40],10,1); WindSpeed = randi([0 20],10,1); T1 = timetable(Temperature,WindSpeed,'RowTimes',Time1)
T1=<em>10×2 timetable</em> Time Temperature Wind Speed ___________ ___________ _________ 01-Jan-2019 35 3 02-Jan-2019 38 20 03-Jan-2019 13 20 04-Jan-2019 38 10 05-Jan-2019 29 16 06-Jan-2019 13 2 07-Jan-2019 18 8 08-Jan-2019 26 19 09-Jan-2019 39 16 10-Jan-2019 39 20
Time2 = (datetime(2020,1,1):days(1):datetime(2020,1,10))'; Temperature = randi([10 40],10,1); WindSpeed = randi([0 20],10,1); T2 = timetable(Temperature,WindSpeed,'RowTimes',Time2)
T2=<em>10×2 timetable</em> Time Temperature Wind Speed ___________ ___________ _________ 01-Jan-2020 30 14 02-Jan-2020 11 0 03-Jan-2020 36 5 04-Jan-2020 38 0 05-Jan-2020 31 2 06-Jan-2020 33 17 07-Jan-2020 33 14 08-Jan-2020 22 6 09-Jan-2020 30 19 10-Jan-2020 15 0
Normalize the first timetable. Specify three outputs: the normalized table, and the center and scale parameter values C
and S
that the function uses to perform the normalization.
[T1_norm,C,S] = normalize(T1)
T1_norm=<em>10×2 timetable</em> Time Temperature Wind Speed ___________ ___________ _________ 01-Jan-2019 0.57687 -1.4636 02-Jan-2019 0.856 0.92885 03-Jan-2019 -1.4701 0.92885 04-Jan-2019 0.856 -0.4785 05-Jan-2019 0.018609 0.36591 06-Jan-2019 -1.4701 -1.6044 07-Jan-2019 -1.0049 -0.75997 08-Jan-2019 -0.26052 0.78812 09-Jan-2019 0.94905 0.36591 10-Jan-2019 0.94905 0.92885
C=<em>1×2 table</em> Temperature Wind Speed ___________ _________ 28.8 13.4
S=<em>1×2 table</em> Temperature Wind Speed ___________ _________ 10.748 7.1056
The second timetable T2
is now normalized using the first normalized parameter value. This method ensures that the data in T2
is centered and scaled in the same way as T1
.
T2_norm = normalize(T2,"center",C,"scale",S)
T2_norm=<em>10×2 timetable</em> Time Temperature Wind Speed ___________ ___________ _________ 01-Jan-2020 0.11165 0.084441 02-Jan-2020 -1.6562 -1.8858 03-Jan-2020 0.66992 -1.1822 04-Jan-2020 0.856 -1.8858 05-Jan-2020 0.2047 -1.6044 06-Jan-2020 0.39078 0.50665 07-Jan-2020 0.39078 0.084441 08-Jan-2020 -0.6327 -1.0414 09-Jan-2020 0.11165 0.78812 10-Jan-2020 -1.284 -1.8858
By default, normalize
operates on all variables in T2
that also exist in C
and S
. To normalize a subset of variables in T2
, use the DataVariables
name-value argument to specify the variables to operate on. The subset of variables you specify must appear in C
and S
.
Specify WindSpeed
as the data variable to operate on. normalize
operates on the variable and returns Temperature
unchanged.
T2_partial = normalize(T2,"center",C,"scale",S,"DataVariables","WindSpeed")
T2_partial=<em>10×2 timetable</em> Time Temperature Wind Speed ___________ ___________ _________ 01-Jan-2020 30 0.084441 02-Jan-2020 11 -1.8858 03-Jan-2020 36 -1.1822 04-Jan-2020 38 -1.8858 05-Jan-2020 31 -1.6044 06-Jan-2020 33 0.50665 07-Jan-2020 33 0.084441 08-Jan-2020 22 -1.0414 09-Jan-2020 30 0.78812 10-Jan-2020 15 -1.8858
#### Anti-counterfeiting watermark--Left Hand の Tomorrow ####
Hello everyone, I am Left Hand の Tomorrow! haven't seen you for a long time
Open a new series today--Redefining the powerful series of matlab
Last updated: May 25, 2023, 286 original blog of Left Hand の Tomorrow
Updated in the column: matlab
#### Anti-counterfeiting watermark--Left Hand の Tomorrow ####