I saw that PCA=∑ (PC1 + PC2) is used in the literature, and there are also values describing the PCA score. How should the PCA score be calculated? . . . . . .
A review of the steps for PCA:
1. Determine the number of principal components
2. Extract principal components
3. Principal component rotation
4. Obtain the principal component score
Table of Contents
PC1 case
1. Determine the number of principal components
2. Extract principal components
3. Principal component rotation
4. Obtain the principal component score
PC2 case PC1 + PC2
1. Determine the number of principal components
2. Extract principal components
3. Principal component rotation
4. Obtain the principal component score
PC1 Case
1, determine the number of principal components
Data Format
rm(list = ls()) library(psych) data <- USJudgeRatings fa.parallel(USJudgeRatings[,-1], fa="pc", n.iter=100, show.legend=FALSE, main="Scree plot with parallel analysis")
Judgment Note: The Kaiser-Harris criterion recommends retaining principal components with eigenvalues greater than 1. Components with eigenvalues less than 1 explain less variance than the variance contained in a single variable. Cattell’s scree test plots eigenvalues and principal components.
Interpretation of results; evaluation of the number of principal components to be retained in US judge scores. Gravel plot (straight line vs.
x
symbol), large eigenvalues
At
1
criterion (horizontal line) and
100
Parallel analyzes (dashed lines) of sub-simulations all show that retaining one principal component is enough
2, extract principal components
principal(r, nfactors=, rotate=, scores=) r is the correlation coefficient matrix or original data matrix; nfactors sets the principal component number (default is 1); rotate specifies the rotation method (default maximum variance rotation (varimax)); scores sets whether principal component scores need to be calculated (not required by default).
Extract principal components
pc <- principal(USJudgeRatings[,-1], nfactors = 1) pc
Principal Components Analysis Call: principal(r = USJudgeRatings[, -1],<strong> nfactors = 1</strong>) Standardized loadings (pattern matrix) based upon correlation matrix PC1 h2 u2 com INTG 0.92 0.84 0.1565 1 DMNR 0.91 0.83 0.1663 1 DILG 0.97 0.94 0.0613 1 CFMG 0.96 0.93 0.0720 1 DECI 0.96 0.92 0.0763 1 PREP 0.98 0.97 0.0299 1 FAMI 0.98 0.95 0.0469 1 ORAL 1.00 0.99 0.0091 1 WRIT 0.99 0.98 0.0196 1 PHYS 0.89 0.80 0.2013 1 RTEN 0.99 0.97 0.0275 1 PC1 SS loadings 10.13 Proportion Var 0.92 Mean item complexity = 1 Test of the hypothesis that 1 component is sufficient. The root mean square of the residuals (RMSR) is 0.04 with the empirical chi square 6.21 with prob < 1
Interpretation of results: Column PC1 contains component loadings, which refers to the correlation coefficient between observed variables and principal components. If more than one principal component is extracted, there will also be columns PC2, PC3, etc. Component loadings can be used to explain the meaning of principal components. It can be seen here that the first principal component (PC1) is highly correlated with each variable, that is, it is a dimension that can be used for general evaluation.
h2
Columns refer to ingredients
common factor
Variance – the degree of variance explained by the principal components for each variable.
u2
Columns refer to ingredients
uniqueness
–Variance
The proportion that cannot be explained by the principal components (
1-
h2
). For example, physical fitness (
PHYS
)
80%
The variance of can be explained by the first principal component,
20%
cannot. in comparison,
PHYS
It is the variable with the worst representation using the first principal component.
SS loadings
Rows contain eigenvalues associated with principal components, referring to the normalized eigenvalues associated with a specific principal component
Variance value (in this example, the value of the first principal component is
10
). at last,
Proportion Var
The rows represent the integers of each principal component
Explanation level of a data set. Here we can see that the first principal component explains
11
variables
92%
Variance.
3, principal component rotation
4, get the principal component score
In the American judge scoring example, we extracted a principal component based on 11 scoring variables in the original data.
pc <-principal(USJudgeRatings[,-1], nfactors=1, score=TRUE) head(pc$scores)
PC1 AARONSON,L.H. -0.1857981 ALEXANDER,J.M. 0.7469865 ARMENTANO,A.J. 0.0704772 BERDON,R.I. 1.1358765 BRACKEN,J.J. -2.1586211 BURNS,E.B. 0.7669406
When scores = TRUE, the principal component scores are stored in the scores element of the object returned by the principal() function.
PC2 case PC1 + PC2
1, determine the number of principal components
Data format
rm(list = ls()) library(psych) Example of #Harman23.cor data set. This data set itself is already a correlation coefficient matrix. ##Determine the number of principal components fa.parallel(Harman23.cor$cov, n.obs=302, fa="pc", n.iter=100, show.legend=FALSE, main="Scree plot with parallel analysis") #Result prompt Parallel analysis suggests that the number of factors = NA and the number of components = 2
2, extract principal components
pc <- principal(Harman23.cor$cov, nfactors=2,##According to judgment, 2 principal components are needed rotate="none") pc
Principal Components Analysis Call: principal(r = Harman23.cor$cov, nfactors = 2, rotate = "none") Standardized loadings (pattern matrix) based upon correlation matrix PC1 PC2 h2 u2 com height 0.86 -0.37 0.88 0.123 1.4 arm.span 0.84 -0.44 0.90 0.097 1.5 forearm 0.81 -0.46 0.87 0.128 1.6 lower.leg 0.84 -0.40 0.86 0.139 1.4 weight 0.76 0.52 0.85 0.150 1.8 bitro.diameter 0.67 0.53 0.74 0.261 1.9 chest.girl 0.62 0.58 0.72 0.283 2.0 chest.width 0.67 0.42 0.62 0.375 1.7 PC1 PC2 SS loadings 4.67 1.77 <strong>Proportion Var 0.58 0.22</strong> Cumulative Var 0.58 0.81 Proportion Explained 0.73 0.27 Cumulative Proportion 0.73 1.00 Mean item complexity = 1.7 Test of the hypothesis that 2 components are sufficient. The root mean square of the residuals (RMSR) is 0.05 Fit based upon off diagonal values = 0.99
The first principal component explains body measurements
58% of the variance, while the second principal component explains
twenty two%
, the two explain in total
81% variance. In the h2 column, the two principal components explained 88% of the variance for the height variable and 90% of the variance for the arm span variable.
The loading matrix explains the meaning of the components and factors. The first principal component is positively correlated with each body measure, and it appears that
A general measurement factor; the second principal component is related to the first four variables (
height
,
arm.span
,
forearm
and
lower.leg
)
Negatively correlated with the last four variables (
weight
,
bitro.diameter
,
chest.girl
and
chest.width
) is positively correlated,
So it appears to be a length
–
capacity factor. But conceptually nothing is easy to construct, and when multiple components are extracted,
Rotating them makes the results more interpretable, which we discuss next.
3, principal component rotation
Rotations are a series of mathematical methods that make the component loading matrix more interpretable.
Denoise
. rotating square
There are two methods: Keep the selected components irrelevant (
orthogonal rotation
), and making them relevant (
oblique rotation
).
rc <- principal(Harman23.cor$cov, nfactors=2, rotate="varimax") rc
Principal Components Analysis Call: principal(r = Harman23.cor$cov, nfactors = 2, rotate = "varimax") Standardized loadings (pattern matrix) based upon correlation matrix RC1 RC2 h2 u2 com height 0.90 0.25 0.88 0.123 1.2 arm.span 0.93 0.19 0.90 0.097 1.1 forearm 0.92 0.16 0.87 0.128 1.1 lower.leg 0.90 0.22 0.86 0.139 1.1 weight 0.26 0.88 0.85 0.150 1.2 bitro.diameter 0.19 0.84 0.74 0.261 1.1 chest.girl 0.11 0.84 0.72 0.283 1.0 chest.width 0.26 0.75 0.62 0.375 1.2 RC1 RC2 SS loadings 3.52 2.92 Proportion Var 0.44 0.37 Cumulative Var 0.44 0.81 Proportion Explained 0.55 0.45 Cumulative Proportion 0.55 1.00 Mean item complexity = 1.1 Test of the hypothesis that 2 components are sufficient. The root mean square of the residuals (RMSR) is 0.05 Fit based upon off diagonal values = 0.99
Column names are from
PC
became
RC
, to indicate that the component is rotated. observe
RC1
Column loadings you can find the first principal component
Mainly explained by the first four variables (length variable).
RC2
The loadings of the columns indicate that the second principal component is dominated by the variable
5
to variable
8
to solve
explanation (capacity variable). Note that the two principal components are still uncorrelated and the interpretability of the variables remains unchanged. This is because the groups of variables have not changed.
changes occur. In addition, the interpretability of the cumulative variance after the rotation of the two principal components did not change (
81%
), what changes is only the pairs of principal components
Explanation of variance (components
1
from
58%
become
44%
,Element
2
from
twenty two%
become
37%
). The variance explanation of each component is consistent and accurate.
In other words, they should be called components instead of principal components at this time (because the variance maximization property of a single principal component is not retained)
Our ultimate goal is to replace a larger set of related variables with a smaller set of variables, so you also need to get each view
Score on ingredients.
4, get the principal component score
When principal component analysis is based on a correlation coefficient matrix, the original data are not available, and it is impossible to obtain the principal components of each observation
score, but you can get the coefficients used to calculate the principal component scores. In the body measurement data, you have the correlation coefficients between various body measurement indicators, but you do not have the individual measurements of 305 girls.
rc <- principal(Harman23.cor$cov, nfactors = 2, rotate = "varimax") round(unclass(rc$weights),2)
RC1 RC2 height 0.28 -0.05 arm.span 0.30 -0.08 forearm 0.30 -0.09 lower.leg 0.28 -0.06 weight -0.06 0.33 bitro.diameter -0.08 0.32 chest.girl -0.10 0.34 chest.width -0.04 0.27
Then the calculation of the corresponding PC1 and PC2 principal components is as follows:
The principal function and the prcomp function are both functions used to calculate principal component analysis (PCA), but there are some differences between them.
- The principal function is a function in the psych package that can calculate PCA and other analysis methods such as factor analysis and structural equation modeling. Its output includes information such as the variance contribution rate, component loadings and scores of each principal component.
- The prcomp function is a function in the stats package and it can only calculate PCA. Its output includes information such as the variance contribution rate, component loadings and scores of each principal component, which is similar to the output of the principal function.
In addition to the difference in output results, the two functions also have different algorithms for calculating PCA. The prcomp function uses the standard singular value decomposition (SVD) algorithm, while the principal function uses different algorithms such as the maximum likelihood estimation method or the minimum residual method.
“R Language Practical Combat”
How to implement principal component analysis (PCA) in R language, the most complete and detailed textbook_r studio principal component analysis-CSDN blog
Machine learning experiment: Principal component analysis method PCA realizes dimensionality reduction of handwritten digit data set_pca principal component analysis and cross-validation-CSDN Blog
[Use PCA to implement dimensionality reduction on iris four-dimensional data (Iris)]_Implementation of PCA dimensionality reduction on the iris data set_The wind is gentle_Esther’s blog-CSDN blog