Principal Components Analysis (PCA)

PCA is an Unsupervised Learning algorithm using multivariate statistical procedure to reduce data size(Dimensionality Reduction) in large data tables using a smaller set of “summary indices” that can be more easily visualized and analyzed.

Tip

PCA, similar to Clustering, can be used to group data point together based on proximity. PCA groups items by partitioning them.


Goal

  • Extract the important information from the data and to express this information as a set of summary indices called principal components.
  • Help identify correlations between data points(identify the most meaningful basis for re-expressing data).

Applications:

  • Pattern Recognition
  • Outlier Detection: low-variance components indicates Outliers which can be dropped out of training process.
  • Signal Processing & Noise Reduction: Depending on the data it may be possible to collect the (informative) signals into a smaller number of features, while identifying the noise and improving the signal-to-noise ratio.
  • Cluster Analysis
  • Feature learning: Highly redundant features(I.e. having Multicollinearity), can be partitioned out the redundancy into one or more near-zero variance components with little or no information and dropped out of training process.
  • Decorrelation: PCA transforms correlated features into uncorrelated components, which is useful in training machine learning models with algorithms that have problems with correlated features.
Tip

PCS can be used to visualize clusters and if the result of clustering algorithms such as K-Means is close to it, then the clustering can be considered accurate.


Other types of PCA:

  • cPCA is used to capture variability due to a specified cause or reason, instead of other variables.
  • Robust PCA, unlike Traditional PCA which is more suitable to handle small noise without corrupted observations, can handle corrupted data without significant reduction of accuracy.

Notes:

  • PCA can be considered a type of Singular Value Decomposition (SVD) where singular values correspond to the eigenvectors and the values of the diagonal matrix are the squares of the eigenvalues.
  • Calculating components exposes the variational structure of the data, which is useful for Feature Engineering.
  • PCA only works with numeric features, like continuous quantities or counts.
  • PCA is sensitive to scale. Standardizing data before performing PCA can improve the results.
  • To improve PCA results, one practice is to identify and remove Outliers before performing PCA.