[Redefining matlab powerful series ten] function normalize to normalize data

Operating environment: Matlab

Written by: Left Hand の Tomorrow

Featured column: “python”

Recommended column: “Algorithm Research”

#### Anti-counterfeiting watermark–Left hand の tomorrow ####

Hello everyone, I am Left Hand の Tomorrow! haven’t seen you for a long time

Open a new series today–Redefining the powerful series of matlab

Last updated: May 25, 2023, 286 original blog of Left Hand の Tomorrow

Updated in column: matlab

#### Anti-counterfeiting watermark–Left Hand の Tomorrow ####

Function Description

N = normalize(A) Returns the z-values (centered at 0 and standard deviation of 1) of the data in A by vector.

  • If A is a vector, then normalize operates on the entire vector A.

  • If A is a matrix, then normalize operates on each column of A separately.

  • If A is a multidimensional array, then normalize operates along the first dimension of A whose size is not equal to 1.

  • If A is a table or timetable, then normalize operates on each variable of A separately.

N = normalize(A,dim) specifies the dimension of A along which to operate. For example, normalize(A,2) normalizes each row.

N = normalize(___,method) Use either of the syntaxes above to specify the normalization method. For example, normalize(A,'norm') normalizes the data in A by the Euclidean norm (2-norm).

N = normalize(___,method,methodtype) specifies the normalization type for the given method. For example, normalize(A,'norm',Inf) normalizes the data in A using the infinity norm.

method – normalization method

Normalization method, specified as one of the following options:

method

Description

'zscore'

A z-score with mean 0 and standard deviation 1.

'norm'

2-norm.

'scale'

Scaled by standard deviation.

'range'

Rescales the data range to [0,1].

'center'

Center the data to have a mean of 0.

'medianiqr'

Center and scale the data so that the median is 0 and the interquartile range is 1.

To return the parameters that the function uses to normalize the data, specify the C and S output arguments.

methodtype – method type

Method type, specified as an array, table, two-element row vector, or type name, depending on the method specified:

method

Method Type Options

describe

'zscore'

'std' (default)

Center and scale to have a mean of 0 and a standard deviation of 1.

'robust'

Center and scale so that the median is 0 and the median absolute deviation is 1.

'norm'

positive numeric scalar (default is 2)

p-norm

Inf

Infinite norm.

'scale'

'std' (default)

Scaled by standard deviation.

'mad'

Scale by median absolute deviation.

'first'

Scale by the first element of the data.

'iqr'

Scale by interquartile range.

numeric array

Scale by value.

surface

Use variable scaling from table. Each table variable in the input data A is scaled with the value of a similarly named variable in the scaling table.

'range'

two-element row vector (default [0 1])

Rescale the data range to an interval in the form of [a b], where a < b code>.

'center'

'mean' (default)

Center to have a mean of 0.

'median'

Center so that median is 0.

numeric array

Translate center by value. The array must have a size compatible with input A.

surface

Translate the center using the variables in the table. Each table variable in the input data A is centered using the value from a similarly named variable in the centering table.

To return the parameters that the function uses to normalize the data, specify the C and S output arguments.

N = normalize(___,'center',centertype,'scale',scaletype?) while using ' center' and 'scale' methods. Only these two methods can be used together. If no centertype or scaletype is specified, normalize will use the method's default method type (centered so that the mean is 0 and scaled by the standard deviation).

This syntax supports simultaneous execution of both methods with arbitrary centering and scaling types. For example, N = normalize(A,'center','median','scale','mad'). You can also use this syntax to specify previously computed normalized centering and scaling values C and S. For example, normalize a dataset with [N1,C,S] = normalize(A1) and save the parameters. Then, reuse these parameters for different datasets with N2 = normalize(A2,'center',C,'scale',S) .

N = normalize(___,Name,Value) Use one or more name-value arguments to specify additional parameters for smoothing. For example, when A is a table or timetable, normalize(A,'DataVariables',datavars) normalizes the variables specified by datavars Normalized.

[N,C,S] = normalize(___) also returns the centering and scaling values C and S. Then you can use C and S by N = normalize(A2,'center',C,'scale',S) code> to normalize different input data.

N - normalized value

Normalized values, returned as an array, table, or timetable.

N is the same size as A unless ReplaceValues is false. If the value of ReplaceValues is false, the width of N is the sum of the input data width and the specified number of data variables.

normalize normally operates on all variables of input tables and timetables, with the following exceptions:

  • If DataVariables is specified, normalize operates only on the specified variables.

  • If you use the syntax normalize(T,'center',C,'scale',S) to use previously calculated parameters C and S to normalize a table or timetable T, then normalize will automatically use C and S to determine the data variable in T on which to operate.

C - centralization value

Centering values, returned as an array or table.

When A is an array, normalize returns C and S as an array, satisfying N = (A - C) ./S. Each value in C is a centering value used to perform normalization on the specified dimension. For example, if A is a 10×10 data matrix and normalize operates on the first dimension, then C is a 1×10 vector , which contains the centered values for each column in A.

When A is a table or timetable, normalize returns C and S as a table with normalization A table of centered and scaled values for each table variable that is normalized, ie N.Var = (A.Var - C.Var) ./ S.Var. The table variable names for C and S match the corresponding table variables in the input. Each variable in C contains the centered value used to normalize the similarly named variable in A.

S - scaling value

Scale values, returned as an array or table.

When A is an array, normalize returns C and S as an array, satisfying N = (A - C) ./S. Each value in S is a scaling value used to perform normalization on the specified dimension. For example, if A is a 10×10 data matrix and normalize operates on the first dimension, then S is a 1×10 vector , which contains the scaled values for each column in A.

When A is a table or timetable, normalize returns C and S as a table with normalization A table of centered and scaled values for each table variable that is normalized, ie N.Var = (A.Var - C.Var) ./ S.Var. The table variable names for C and S match the corresponding table variables in the input. Each variable in S contains scaled values used to normalize the similarly named variable in A .

Z value

The z-score measures the distance of the data points from the mean in units of standard deviation. The normalized dataset has a mean of 0, a standard deviation of 1, and preserves the shape properties of the original dataset (same skewness and kurtosis).

For a random variable X with mean μ and standard deviation σ, the z-value for a value x is z=(x?μ)/σ. For sampled data with mean  ̄ ̄X and standard deviation S, the z-value for data point x is z=(x? ̄ ̄X)/S.

P-norm

The general definition of the p-norm of a vector v with N elements is

, where p is any positive real value, Inf or -Inf. Some common values for p are 1, 2, and Inf.

  • If p is 1, the resulting 1-norm is the sum of the absolute values of the vector elements.

  • If p is 2, the resulting 2-norm is the magnitude or Euclidean length of the vector.

  • If p is Inf, then ‖v‖∞=maxi(∣v(i)∣).

Rescale

Rescaling changes the distance between the minimum and maximum values in a data set by stretching or compressing points along the number line. The z-scores of the data are preserved, so the shape of the distribution remains the same.

The equation to rescale the data X to an arbitrary interval [a b] is

While both the normalize and rescale functions can rescale data to an arbitrary interval, rescale also allows input data to be clipped to specified minimum and maximum values.

Interquartile range

The interquartile range (IQR) of a data set describes the range of values in the middle 50% when the values are ordered. If the median of the data is Q2, the median of the lower half of the data is Q1, and the median of the upper half of the data is Q3, then IQR = Q3 - Q1.

When the data contains outliers (very large or very small values), IQR is often better than looking at the full range of the data because IQR excludes the largest 25% and smallest 25% of values in the data.

Median absolute deviation

The median absolute deviation (MAD) of a dataset is the median value of absolute deviations from the median of the data ?X: MAD=median(∣∣xX∣∣). Therefore, MAD accounts for the variability of the data relative to the median.

When the data contain outliers (very large or very small values), MAD is often preferable to using the standard deviation of the data, because the standard deviation squares the deviation from the mean, making the influence of the outliers disproportionate. Conversely, deviations from a small number of outliers do not affect the value of MAD.

Example

Vector and matrix data

Normalize data in vectors and matrices by computing Z-scores.

Normalize the data to have a mean of 0 and a standard deviation of 1 by creating a vector v and computing the Z-score.

v = 1:5;
N = normalize(v)
N = <em>1×5</em>

   -1.2649 -0.6325 0 0.6325 1.2649

Create a matrix B and calculate the Z-scores for each column. Then, normalize each row.

B = magic(3)
B = <em>3×3</em>

     8 1 6
     3 5 7
     4 9 2

N1 = normalize(B)
N1 = <em>3×3</em>

    1.1339 -1.0000 0.3780
   -0.7559 0 0.7559
   -0.3780 1.0000 -1.1339

N2 = normalize(B,2)
N2 = <em>3×3</em>

    0.8321-1.1094 0.2774
   -1.0000 0 1.0000
   -0.2774 1.1094 -0.8321

Zoom data

Scale the vector A by its standard deviation.

A = 1:5;
Ns = normalize(A,'scale')
Ns = <em>1×5</em>

    0.6325 1.2649 1.8974 2.5298 3.1623

Scales A to be in the range [0,1].

Nr = normalize(A,'range')
Nr = <em>1×5</em>

         0 0.2500 0.5000 0.7500 1.0000

Specify method type

Create the vector A and normalize it by its 1-norm.

A = 1:5;
Np = normalize(A,'norm',1)
Np = <em>1×5</em>

    0.0667 0.1333 0.2000 0.2667 0.3333

Center the data in A so that its mean is 0.

Nc = normalize(A,'center','mean')
Nc = <em>1×5</em>

    -2 -1 0 1 2

table variables

Create a table that contains height information for five people.

LastName = {'Sanchez';'Johnson';'Lee';'Diaz';'Brown'};
Height = [71;69;64;67;64];
T = table(LastName,Height)
T=<em>5×2 table</em>
    LastName Height
    _______________

    'Sanchez' 71
    'Johnson' 69
    'Lee' 64
    'Diaz' 67
    'Brown' 64

Normalize the height data by the maximum height.

N = normalize(T,'norm',Inf,'DataVariables','Height')
N=<em>5×2 table</em>
    LastName Height
    _________ _______

    'Sanchez' 1
    'Johnson' 0.97183
    'Lee' 0.90141
    'Diaz' 0.94366
    'Brown' 0.90141

Normalize multiple datasets with the same parameters

Normalizes a dataset, returns computed parameter values, and reuses these parameters to apply the same normalization to another dataset.

Create a timetable with two variables Temperature and WindSpeed . A second timetable is then created with the same variables, but using samples collected one year later.

rng default
Time1 = (datetime(2019,1,1):days(1):datetime(2019,1,10))';
Temperature = randi([10 40],10,1);
WindSpeed = randi([0 20],10,1);
T1 = timetable(Temperature,WindSpeed,'RowTimes',Time1)
T1=<em>10×2 timetable</em>
       Time Temperature Wind Speed
    ___________ ___________ _________

    01-Jan-2019 35 3
    02-Jan-2019 38 20
    03-Jan-2019 13 20
    04-Jan-2019 38 10
    05-Jan-2019 29 16
    06-Jan-2019 13 2
    07-Jan-2019 18 8
    08-Jan-2019 26 19
    09-Jan-2019 39 16
    10-Jan-2019 39 20

Time2 = (datetime(2020,1,1):days(1):datetime(2020,1,10))';
Temperature = randi([10 40],10,1);
WindSpeed = randi([0 20],10,1);
T2 = timetable(Temperature,WindSpeed,'RowTimes',Time2)
T2=<em>10×2 timetable</em>
       Time Temperature Wind Speed
    ___________ ___________ _________

    01-Jan-2020 30 14
    02-Jan-2020 11 0
    03-Jan-2020 36 5
    04-Jan-2020 38 0
    05-Jan-2020 31 2
    06-Jan-2020 33 17
    07-Jan-2020 33 14
    08-Jan-2020 22 6
    09-Jan-2020 30 19
    10-Jan-2020 15 0

Normalize the first timetable. Specify three outputs: the normalized table, and the center and scale parameter values C and S that the function uses to perform the normalization.

[T1_norm,C,S] = normalize(T1)
T1_norm=<em>10×2 timetable</em>
       Time Temperature Wind Speed
    ___________ ___________ _________

    01-Jan-2019 0.57687 -1.4636
    02-Jan-2019 0.856 0.92885
    03-Jan-2019 -1.4701 0.92885
    04-Jan-2019 0.856 -0.4785
    05-Jan-2019 0.018609 0.36591
    06-Jan-2019 -1.4701 -1.6044
    07-Jan-2019 -1.0049 -0.75997
    08-Jan-2019 -0.26052 0.78812
    09-Jan-2019 0.94905 0.36591
    10-Jan-2019 0.94905 0.92885

C=<em>1×2 table</em>
    Temperature Wind Speed
    ___________ _________

       28.8 13.4

S=<em>1×2 table</em>
    Temperature Wind Speed
    ___________ _________

      10.748 7.1056

The second timetable T2 is now normalized using the first normalized parameter value. This method ensures that the data in T2 is centered and scaled in the same way as T1.

T2_norm = normalize(T2,"center",C,"scale",S)
T2_norm=<em>10×2 timetable</em>
       Time Temperature Wind Speed
    ___________ ___________ _________

    01-Jan-2020 0.11165 0.084441
    02-Jan-2020 -1.6562 -1.8858
    03-Jan-2020 0.66992 -1.1822
    04-Jan-2020 0.856 -1.8858
    05-Jan-2020 0.2047 -1.6044
    06-Jan-2020 0.39078 0.50665
    07-Jan-2020 0.39078 0.084441
    08-Jan-2020 -0.6327 -1.0414
    09-Jan-2020 0.11165 0.78812
    10-Jan-2020 -1.284 -1.8858

By default, normalize operates on all variables in T2 that also exist in C and S. To normalize a subset of variables in T2 , use the DataVariables name-value argument to specify the variables to operate on. The subset of variables you specify must appear in C and S.

Specify WindSpeed as the data variable to operate on. normalize operates on the variable and returns Temperature unchanged.

T2_partial = normalize(T2,"center",C,"scale",S,"DataVariables","WindSpeed")
T2_partial=<em>10×2 timetable</em>
       Time Temperature Wind Speed
    ___________ ___________ _________

    01-Jan-2020 30 0.084441
    02-Jan-2020 11 -1.8858
    03-Jan-2020 36 -1.1822
    04-Jan-2020 38 -1.8858
    05-Jan-2020 31 -1.6044
    06-Jan-2020 33 0.50665
    07-Jan-2020 33 0.084441
    08-Jan-2020 22 -1.0414
    09-Jan-2020 30 0.78812
    10-Jan-2020 15 -1.8858


#### Anti-counterfeiting watermark--Left Hand の Tomorrow ####

Hello everyone, I am Left Hand の Tomorrow! haven't seen you for a long time

Open a new series today--Redefining the powerful series of matlab

Last updated: May 25, 2023, 286 original blog of Left Hand の Tomorrow

Updated in the column: matlab

#### Anti-counterfeiting watermark--Left Hand の Tomorrow ####