#acl DanieleMerico:read,write,delete,revert All:read

= PCA =

== In general on PCA ==

PCA is a dimensionality technique that enables to project a multidimensional data-set (e.g. microarray expression matrix) into a new space, where the dimensions are orthogonal and maximize the explanation of some statistical index. PCA maximizes the explained variation. Since the new space is "optimized", it is possible to consider only a limited number of dimensions (i.e. PCA components). As a consequence, the first PCA components (from the 1-st to the i-th) are regarded as highly informative. The ''eigenvalues'' are used to evaluate the information associated to a component.

PCA can be regarded as more selective and noise-cleaning than clustering, as if only the first i-th PCA components are used, some of the information present in the original data-set is neglected. Hopefully, the less relevant information. However, since PCA does not provide groups, it is necessary either to manually explore the space (considering only 2 or 3 components altogether, in 2D or 3D plots), or to define some computational criterion to group the data. If the data are particularly rich, many PCA components may be necessary to account for all relevant features in the data. To evaluate the information content of a single component, the corresponding eigenvalue is used, and to evaluate the cumulative  fraction of information explained by several components, the respective eigenvalues are summed and divided by the total sum of eigenvalues.

== PCA with ''princomp'' ==