Difference between revisions of "Principal Component Analysis (PCA)"

From
Jump to: navigation, search
Line 14: Line 14:
 
* [[Dimensional Reduction Algorithms]]
 
* [[Dimensional Reduction Algorithms]]
 
* [[T-Distributed Stochastic Neighbor Embedding (t-SNE)]]
 
* [[T-Distributed Stochastic Neighbor Embedding (t-SNE)]]
* [http://www.cs.helsinki.fi/u/ahyvarin/whatisica.shtml Independent Component Analysis (ICA) | University of Helsinki]
+
* [[Causation vs. Correlation]] - Multivariate Additive Noise Model (MANM)
 +
** [http://www.cs.helsinki.fi/u/ahyvarin/whatisica.shtml Independent Component Analysis (ICA) | University of Helsinki]
 
** [http://www.cs.helsinki.fi/u/ahyvarin/papers/JMLR06.pdf Linear Non-Gaussian Acyclic Model (ICA-LiNGAM) | S. Shimizu, P. Hoyer, A. Hyvarinen, and A. Kerminen - University of Helsinki]
 
** [http://www.cs.helsinki.fi/u/ahyvarin/papers/JMLR06.pdf Linear Non-Gaussian Acyclic Model (ICA-LiNGAM) | S. Shimizu, P. Hoyer, A. Hyvarinen, and A. Kerminen - University of Helsinki]
 
** [http://archive.org/details/arxiv-1104.2808/page/n15 Greedy DAG Search (GDS) | Alain Hauser and Peter Biihlmann]  
 
** [http://archive.org/details/arxiv-1104.2808/page/n15 Greedy DAG Search (GDS) | Alain Hauser and Peter Biihlmann]  
 +
** [http://auai.org/uai2017/proceedings/papers/250.pdf Feature-to-Feature Regression for a Two-Step Conditional Independence Test | Q. Zhang, S. Filippi, S. Flaxman, and D. Sejdinovic]
  
 
a data reduction technique that allows to simplify multidimensional data sets to 2 or 3 dimensions for plotting purposes and visual variance analysis.
 
a data reduction technique that allows to simplify multidimensional data sets to 2 or 3 dimensions for plotting purposes and visual variance analysis.

Revision as of 11:03, 22 June 2019

YouTube search... ...Google search

a data reduction technique that allows to simplify multidimensional data sets to 2 or 3 dimensions for plotting purposes and visual variance analysis.

  1. Center (and standardize) data
  2. First principal component axis
    1. Across centroid of data cloud
    2. Distance of each point to that line is minimized, so that it crosses the maximum variation of the data cloud
  3. Second principal component axis
    1. Orthogonal to first principal component
    2. Along maximum variation in the data
  4. First PCA axis becomes x-axis and second PCA axis y-axis
  5. Continue process until the necessary number of principal components is obtained


principal-component-analysis-basics-scatter-plot-data-mining-1.png


NumXL