pca学习日记

本内容只写给自己。资料很杂，内容紊乱，品质很差，只当日记。如果有读者若想学习，建议不要看这篇。。。学习其他优质博客中的系统性的资料。

前两天想把算好的词向量投影在三维上，结果发现要先做降维投影。有T-SNE和PCA，前者虽然能保留更多信息，但是不好解释，毕竟我们社会学更在乎“解释”，加上我也不知道它的内部机理到底是什么，所以就做了PCA, 虽然我PCA做出来的东西三维只有14%的解释力，好惨，关于投影参考该文：https://www.douban.com/note/740965280/。

pca

r里有两个函数可以做pca, 根据 http://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/118-principal-component-analysis-in-r-prcomp-vs-princomp/，是prcomp() 和 princomp()，区别是：

General methods for principal component analysis

There are two general methods to perform PCA in R :

Spectral decomposition which examines the covariances / correlations between variables
Singular value decomposition which examines the covariances / correlations between individuals

The function princomp() uses the spectral decomposition approach. The functions prcomp() and PCA()[FactoMineR] use the singular value decomposition (SVD).

princomp()：用谱分解Spectral decomposition，检查变量variables之间的协方差/相关性covariances / correlations
prcomp()：用奇异值分解Singular value decomposition，检查个体individuals/观测点observation之间的协方差/相关性covariances / correlations

prcomp() and princomp() functions 基本格式

The simplified format of these 2 functions are :

prcomp(x, scale = FALSE)
princomp(x, cor = FALSE, scores = TRUE)

Arguments for prcomp():

x: a numeric matrix or data frame
scale: a logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place
1. 。。。。actually这个unit variance单位方差听起来像是z score scaling，对，unit variance scaling 和 z score scaling 是一样的，就是把变成均值变成0，然后每个数除以标准差-The data for each variable (metabolite) is mean centered and then divided by the standard deviation of the variable. This way each variable will have zero mean and unit standard deviation. -------> wiki: https://en.wikipedia.org/wiki/Feature_scaling
2. 关于为什么要做scaling：真正的原因是让每个维度的重要性一样。其实在数学中，奇异值分解本身是完全不需要对矩阵中的元素做标准化或者去中心化。但是，PCA通常是用于高维数据的降维，它可以将原来高维的数据投影到某个低维的空间上并使得其方差尽量大。如果数据其中某一特征（矩阵的某一列）的数值特别大，那么它在整个误差计算的比重上就很大，那么可以想象在投影到低维空间之后，为了使低秩分解逼近原数据，整个投影会去努力逼近最大的那一个特征，而忽略数值比较小的特征。因为在建模前我们并不知道每个特征的重要性，这很可能导致了大量的信息缺失。为了“公平”起见，防止过分捕捉某些数值大的特征，我们会对每个特征先进行标准化处理，使得它们的大小都在相同的范围内，然后再进行PCA。/// 此外，从计算的角度讲，PCA前对数据标准化还有另外一个好处。因为PCA通常是数值近似分解，而非求特征值、奇异值得到解析解，所以当我们使用梯度下降等算法进行PCA的时候，我们最好先要对数据进行标准化，这是有利于梯度下降法的收敛。（http://sofasofa.io/forum_main_post.php?postid=1000375）
3. 根据上面的sofasofa网站，还有有趣的内容，与本文关系不大，但特别摘录一下：没做标准化的PCA是找covariance matrix的eigenvector，标准化后的PCA是找correlation matrix的eigenvector。如清风说的第一点，如果没有做标准化，eigenvector会偏向方差最大的变量，偏离理论上的最佳值。举例说明。假设一个2维Gaussian，correlation matrix是[1 0.4;0.4 1], std(x1)=10,std(x2)=1。理论上最佳的分解向量是椭圆的长轴，如果没有做标准化，PCA算出的向量和长轴会有偏差。标准化后偏差会减到很小。
4. 还有人指出：PCA实现的方式其实有四种：1. 标准化数据后的协方差矩阵 2. 标准化数据后的相关系数矩阵 3. 未标准化数据后的相关系数矩阵 4.标准化数据后的svd方式这四种方式是等价的。
5. 对于pca和svd的关系：见http://wap.sciencenet.cn/blog-2866696-1136451.html
6. 补充一个pca原理：http://wap.sciencenet.cn/blog-2866696-1136447.html（os: 科学网真的不错，就是“每天23点到次日7点之间禁止发表博客评论。”的规则把我弄懵了
7. 关于eigenvector和eigenvalue：这个问题源于为了我想算句子嵌入模型（sentence embedding）中的一个sif模型，其中提到要“计算句向量矩阵的第一个主成分 u，让每个句向量减去它在 u 上的投影(类似 PCA) ”，（简单介绍 https://developer.aliyun.com/article/714547，原文：https://openreview.net/pdf?id=SyK00v5xx），那么啥叫“第一个主成分呢？”根据github中另一个复刻该模型的代码，其用u = pca.components_[0]的方法代表第一个主成分，那么pca.components_[0]是什么？https://blog.csdn.net/sinat_31188625/article/details/72677088 // https://github.com/jx00109/sentence2vec/blob/master/s2v-python3.py，根据python代码实现：（传不上图）总之这东西原本的变量是多少个，它就是几维的。
8. 第一个主成分到底是啥呢？根据stackexchange https://stats.stackexchange.com/questions/311908/what-is-pca-components-in-sk-learn:
  - Annoyingly there is no SKLearn documentation for this attribute, beyond the general description of the PCA method.
  - Here is a useful application of pca.components_ in a classic facial-recognition project (using data bundled with SKL, so you don't have to download anything extra). Working through this concise notebook is the best way to get a feel for the definition & application of pca.components_
  - From that project, and this answer over on StackOverflow, we can learn that pca.components_ is the set of all eigenvectors (aka loadings) for your projection space (one eigenvector for each principal component). Once you have the eigenvectors using pca.components_, here's how to get eigenvalues.
  - For further info on the definitions & applications of eigenvectors vs loadings (including the equation that links all three concepts), see here.
  - For a 2nd project/notebook applying pca.components_ to (the same) facial recognition data, see here. It features a more traditional scree plot than the first project cited above
9. 看标红的字：pca.components_是一组在投影空间中的奇异向量（eigenvectors）as know as(也被称为)loading。每个主成分对应一个奇异向量。
  - loading: 对应的可以看到 R 中的prcomp()返回的结果中有loading的结果
  - R 中的 svd() 返回的V和loading是一样的 http://wap.sciencenet.cn/blog-2866696-1136451.html
  - 载荷因子的R用法(不懂请把PCA基本算法搞懂) princomp.loadings prcomp.rotation
    
    https://blog.csdn.net/lfz_carlos/article/details/48442091?locationNum=4 (有些歧义，具体再看)
  - 总之loading是载荷因子的意思，载荷因子是啥意思呢，我不想看了呜呜呜。用到再说吧！

Arguments for princomp():

x: a numeric matrix or data frame
cor: a logical value. If TRUE, the data will be centered and scaled before the analysis
scores: a logical value. If TRUE, the coordinates on each principal component are calculated

The elements of the outputs returned by the functions prcomp() and princomp() includes :

prcomp() name	princomp() name	Description
sdev	sdev	the standard deviations of the principal components
rotation	loadings	the matrix of variable loadings (columns are eigenvectors)
center	center	the variable means (means that were substracted)
scale	scale	the variable standard deviations (the scaling applied to each variable )
x	scores	The coordinates of the individuals (observations) on the principal components.

最后，补充一些有用的资料：https://stats.stackexchange.com/questions/143905/loadings-vs-eigenvectors-in-pca-when-to-use-one-or-another

In PCA, you split covariance (or correlation) matrix into scale part (eigenvalues) and direction part (eigenvectors). You may then endow eigenvectors with the scale: loadings. So, loadings are thus become comparable by magnitude with the covariances/correlations observed between the variables, - because what had been drawn out from the variables' covariation now returns back - in the form of the covariation between the variables and the principal components. Actually, loadings are the covariances/correlations between the original variables and the unit-scaled components. This answer shows geometrically what loadings are and what are coefficients associating components with variables in PCA or factor analysis.

在PCA中，我们将协方差（或相关）矩阵划分为尺度部分 scale part （特征值）和方向部分 direction part（特征向量）。然后我们对特征向量赋予尺度scale：载荷loading。因此，loading在向量大小(by magnitude)上可以与变量之间观察到的协方差/相关性进行比较，这是因为从变量的协方差中提取的内容现在以变量与主成分之间的协方差的形式返回。实际上， loadings 是原始变量与单位缩放成分之间的协方差/相关性。

Loadings:

Help you interpret principal components or factors; Because they are the linear combination weights (coefficients) whereby unit-scaled components or factors define or "load" a variable.

(Eigenvector is just a coefficient of orthogonal transformation or projection, it is devoid of "load" within its value. "Load" is (information of the amount of) variance, magnitude. PCs are extracted to explain variance of the variables. Eigenvalues are the variances of (= explained by) PCs. When we multiply eigenvector by sq.root of the eivenvalue we "load" the bare coefficient by the amount of variance. By that virtue we make the coefficient to be the measure of association, co-variability.)
Loadings sometimes are "rotated" (e.g. varimax) afterwards to facilitate interpretability (see also);
It is loadings which "restore" the original covariance/correlation matrix (see also this thread discussing nuances of PCA and FA in that respect);
While in PCA you can compute values of components both from eigenvectors and loadings, in factor analysis you compute factor scores out of loadings.
And, above all, loading matrix is informative: its vertical sums of squares are the eigenvalues, components' variances, and its horizontal sums of squares are portions of the variables' variances being "explained" by the components.
Rescaled or standardized loading is the loading divided by the variable's st. deviation; it is the correlation. (If your PCA is correlation-based PCA, loading is equal to the rescaled one, because correlation-based PCA is the PCA on standardized variables.) Rescaled loading squared has the meaning of the contribution of a pr. component into a variable; if it is high (close to 1) the variable is well defined by that component alone.

An example of computations done in PCA and FA for you to see.

Eigenvectors are unit-scaled loadings; and they are the coefficients (the cosines) of orthogonal transformation (rotation) of variables into principal components or back. Therefore it is easy to compute the components' values (not standardized) with them. Besides that their usage is limited. Eigenvector value squared has the meaning of the contribution of a variable into a pr. component; if it is high (close to 1) the component is well defined by that variable alone.

Although eigenvectors and loadings are simply two different ways to normalize coordinates of the same points representing columns (variables) of the data on a biplot, it is not a good idea to mix the two terms. This answer explained why. See also.

向量的投影计算：

一个自编代码的实现：https://www.jianshu.com/p/b26c1eb2abb1

想看《唐顿庄园》，这个名字听起来好喜欢，和《纯真年代》感觉好像。

看一下这个Writing Your Research in English, Jay Wang, March 3, 2020 https://www.bilibili.com/video/av93800074/ 有一定的观看门槛，但是认真看几遍的话对学术写作有一些帮助。转自豆瓣 https://www.youtube.com/watch?v=QR7m9GR7Iic

啰嗦本人，原理不会，代码也不会，社会学的想象力倒是挺多。

记录一下思考和学习的过程，内部应该有两处前后冲突的地方，但我暂时也懒得改了。

总之，尝试一下记录自己的学习过程，发现自己的思路真的好跳，就是直觉式的，跳脱来跳脱去。。。。。。。

还老是开小差，看看这个看看那个。我自己都被惊吓到了！！

本内容只写给自己。资料丰富，内容紊乱。建议读者学习其他系统性的资料。

General methods for principal component analysis

prcomp() and princomp() functions 基本格式

猜你喜欢