Principal component analysis PCA is PC1 or PC1 + PC2

I saw that PCA=∑ (PC1 + PC2) is used in the literature, and there are also values describing the PCA score. How should the PCA score be calculated? . . . . . .

A review of the steps for PCA:

1. Determine the number of principal components

2. Extract principal components

3. Principal component rotation

4. Obtain the principal component score

Table of Contents

PC1 case

1. Determine the number of principal components

2. Extract principal components

3. Principal component rotation

4. Obtain the principal component score

PC2 case PC1 + PC2

1. Determine the number of principal components

2. Extract principal components

3. Principal component rotation

4. Obtain the principal component score


PC1 Case

1, determine the number of principal components

Data Format

rm(list = ls())
library(psych)
data <- USJudgeRatings
fa.parallel(USJudgeRatings[,-1], fa="pc",
            n.iter=100,
            show.legend=FALSE,
            main="Scree plot with parallel analysis")

Judgment Note: The Kaiser-Harris criterion recommends retaining principal components with eigenvalues greater than 1. Components with eigenvalues less than 1 explain less variance than the variance contained in a single variable. Cattell’s scree test plots eigenvalues and principal components.

Interpretation of results; evaluation of the number of principal components to be retained in US judge scores. Gravel plot (straight line vs.
x
symbol), large eigenvalues

At
1
criterion (horizontal line) and
100
Parallel analyzes (dashed lines) of sub-simulations all show that retaining one principal component is enough

2, extract principal components
principal(r, nfactors=, rotate=, scores=)
r is the correlation coefficient matrix or original data matrix;
nfactors sets the principal component number (default is 1);
rotate specifies the rotation method (default maximum variance rotation (varimax));
scores sets whether principal component scores need to be calculated (not required by default). 

Extract principal components

pc <- principal(USJudgeRatings[,-1], nfactors = 1)
pc
Principal Components Analysis
Call: principal(r = USJudgeRatings[, -1],<strong> nfactors = 1</strong>)
Standardized loadings (pattern matrix) based upon correlation matrix
      PC1 h2 u2 com
INTG 0.92 0.84 0.1565 1
DMNR 0.91 0.83 0.1663 1
DILG 0.97 0.94 0.0613 1
CFMG 0.96 0.93 0.0720 1
DECI 0.96 0.92 0.0763 1
PREP 0.98 0.97 0.0299 1
FAMI 0.98 0.95 0.0469 1
ORAL 1.00 0.99 0.0091 1
WRIT 0.99 0.98 0.0196 1
PHYS 0.89 0.80 0.2013 1
RTEN 0.99 0.97 0.0275 1

                 PC1
SS loadings 10.13
Proportion Var 0.92

Mean item complexity = 1
Test of the hypothesis that 1 component is sufficient.

The root mean square of the residuals (RMSR) is 0.04
 with the empirical chi square 6.21 with prob < 1

Interpretation of results: Column PC1 contains component loadings, which refers to the correlation coefficient between observed variables and principal components. If more than one principal component is extracted, there will also be columns PC2, PC3, etc. Component loadings can be used to explain the meaning of principal components. It can be seen here that the first principal component (PC1) is highly correlated with each variable, that is, it is a dimension that can be used for general evaluation.

h2
Columns refer to ingredients
common factor
Variance – the degree of variance explained by the principal components for each variable.
u2
Columns refer to ingredients
uniqueness
–Variance

The proportion that cannot be explained by the principal components (
1-
h2
). For example, physical fitness (
PHYS
)
80%
The variance of can be explained by the first principal component,

20%
cannot. in comparison,
PHYS
It is the variable with the worst representation using the first principal component.

SS loadings
Rows contain eigenvalues associated with principal components, referring to the normalized eigenvalues associated with a specific principal component

Variance value (in this example, the value of the first principal component is
10
). at last,
Proportion Var
The rows represent the integers of each principal component

Explanation level of a data set. Here we can see that the first principal component explains
11
variables
92%
Variance.

3, principal component rotation
4, get the principal component score

In the American judge scoring example, we extracted a principal component based on 11 scoring variables in the original data.

pc <-principal(USJudgeRatings[,-1], nfactors=1, score=TRUE)
head(pc$scores)
PC1
AARONSON,L.H. -0.1857981
ALEXANDER,J.M. 0.7469865
ARMENTANO,A.J. 0.0704772
BERDON,R.I. 1.1358765
BRACKEN,J.J. -2.1586211
BURNS,E.B. 0.7669406

When scores = TRUE, the principal component scores are stored in the scores element of the object returned by the principal() function.

PC2 case PC1 + PC2

1, determine the number of principal components

Data format

rm(list = ls())
library(psych)
Example of #Harman23.cor data set. This data set itself is already a correlation coefficient matrix.
##Determine the number of principal components
fa.parallel(Harman23.cor$cov, n.obs=302,
            fa="pc",
            n.iter=100,
            show.legend=FALSE,
            main="Scree plot with parallel analysis")
#Result prompt
Parallel analysis suggests that the number of factors = NA
and the number of components = 2 

2, extract principal components
pc <- principal(Harman23.cor$cov,
                nfactors=2,##According to judgment, 2 principal components are needed
                rotate="none")
pc
Principal Components Analysis
Call: principal(r = Harman23.cor$cov, nfactors = 2, rotate = "none")
Standardized loadings (pattern matrix) based upon correlation matrix
                PC1 PC2 h2 u2 com
height 0.86 -0.37 0.88 0.123 1.4
arm.span 0.84 -0.44 0.90 0.097 1.5
forearm 0.81 -0.46 0.87 0.128 1.6
lower.leg 0.84 -0.40 0.86 0.139 1.4
weight 0.76 0.52 0.85 0.150 1.8
bitro.diameter 0.67 0.53 0.74 0.261 1.9
chest.girl 0.62 0.58 0.72 0.283 2.0
chest.width 0.67 0.42 0.62 0.375 1.7
                       PC1 PC2
SS loadings 4.67 1.77
<strong>Proportion Var 0.58 0.22</strong>
Cumulative Var 0.58 0.81
Proportion Explained 0.73 0.27
Cumulative Proportion 0.73 1.00
Mean item complexity = 1.7
Test of the hypothesis that 2 components are sufficient.
The root mean square of the residuals (RMSR) is 0.05
Fit based upon off diagonal values = 0.99

The first principal component explains body measurements
58% of the variance, while the second principal component explains
twenty two%
, the two explain in total
81% variance. In the h2 column, the two principal components explained 88% of the variance for the height variable and 90% of the variance for the arm span variable.

The loading matrix explains the meaning of the components and factors. The first principal component is positively correlated with each body measure, and it appears that

A general measurement factor; the second principal component is related to the first four variables (
height
,
arm.span
,
forearm
and
lower.leg
)

Negatively correlated with the last four variables (
weight
,
bitro.diameter
,
chest.girl
and
chest.width
) is positively correlated,

So it appears to be a length

capacity factor. But conceptually nothing is easy to construct, and when multiple components are extracted,

Rotating them makes the results more interpretable, which we discuss next.

3, principal component rotation

Rotations are a series of mathematical methods that make the component loading matrix more interpretable.
Denoise
. rotating square

There are two methods: Keep the selected components irrelevant (
orthogonal rotation
), and making them relevant (
oblique rotation
).

rc <- principal(Harman23.cor$cov,
                nfactors=2,
                rotate="varimax")
rc
Principal Components Analysis
Call: principal(r = Harman23.cor$cov, nfactors = 2, rotate = "varimax")
Standardized loadings (pattern matrix) based upon correlation matrix
                RC1 RC2 h2 u2 com
height 0.90 0.25 0.88 0.123 1.2
arm.span 0.93 0.19 0.90 0.097 1.1
forearm 0.92 0.16 0.87 0.128 1.1
lower.leg 0.90 0.22 0.86 0.139 1.1
weight 0.26 0.88 0.85 0.150 1.2
bitro.diameter 0.19 0.84 0.74 0.261 1.1
chest.girl 0.11 0.84 0.72 0.283 1.0
chest.width 0.26 0.75 0.62 0.375 1.2
                       RC1 RC2
SS loadings 3.52 2.92
Proportion Var 0.44 0.37
Cumulative Var 0.44 0.81
Proportion Explained 0.55 0.45
Cumulative Proportion 0.55 1.00
Mean item complexity = 1.1
Test of the hypothesis that 2 components are sufficient.
The root mean square of the residuals (RMSR) is 0.05
Fit based upon off diagonal values = 0.99

Column names are from
PC
became
RC
, to indicate that the component is rotated. observe
RC1
Column loadings you can find the first principal component

Mainly explained by the first four variables (length variable).
RC2
The loadings of the columns indicate that the second principal component is dominated by the variable
5
to variable
8
to solve

explanation (capacity variable). Note that the two principal components are still uncorrelated and the interpretability of the variables remains unchanged. This is because the groups of variables have not changed.

changes occur. In addition, the interpretability of the cumulative variance after the rotation of the two principal components did not change (
81%
), what changes is only the pairs of principal components

Explanation of variance (components
1
from
58%
become
44%
,Element
2
from
twenty two%
become
37%
). The variance explanation of each component is consistent and accurate.

In other words, they should be called components instead of principal components at this time (because the variance maximization property of a single principal component is not retained)

Our ultimate goal is to replace a larger set of related variables with a smaller set of variables, so you also need to get each view

Score on ingredients.

4, get the principal component score

When principal component analysis is based on a correlation coefficient matrix, the original data are not available, and it is impossible to obtain the principal components of each observation

score, but you can get the coefficients used to calculate the principal component scores. In the body measurement data, you have the correlation coefficients between various body measurement indicators, but you do not have the individual measurements of 305 girls.

rc <- principal(Harman23.cor$cov,
                nfactors = 2,
                rotate = "varimax")
round(unclass(rc$weights),2)
RC1 RC2
height 0.28 -0.05
arm.span 0.30 -0.08
forearm 0.30 -0.09
lower.leg 0.28 -0.06
weight -0.06 0.33
bitro.diameter -0.08 0.32
chest.girl -0.10 0.34
chest.width -0.04 0.27

Then the calculation of the corresponding PC1 and PC2 principal components is as follows:

The principal function and the prcomp function are both functions used to calculate principal component analysis (PCA), but there are some differences between them.

  • The principal function is a function in the psych package that can calculate PCA and other analysis methods such as factor analysis and structural equation modeling. Its output includes information such as the variance contribution rate, component loadings and scores of each principal component.
  • The prcomp function is a function in the stats package and it can only calculate PCA. Its output includes information such as the variance contribution rate, component loadings and scores of each principal component, which is similar to the output of the principal function.

In addition to the difference in output results, the two functions also have different algorithms for calculating PCA. The prcomp function uses the standard singular value decomposition (SVD) algorithm, while the principal function uses different algorithms such as the maximum likelihood estimation method or the minimum residual method.

“R Language Practical Combat”

How to implement principal component analysis (PCA) in R language, the most complete and detailed textbook_r studio principal component analysis-CSDN blog

Machine learning experiment: Principal component analysis method PCA realizes dimensionality reduction of handwritten digit data set_pca principal component analysis and cross-validation-CSDN Blog

[Use PCA to implement dimensionality reduction on iris four-dimensional data (Iris)]_Implementation of PCA dimensionality reduction on the iris data set_The wind is gentle_Esther’s blog-CSDN blog