Difference between revisions of "Dimensional Reduction"

From
Jump to: navigation, search
m (Projection)
m
Line 13: Line 13:
 
* [[Isomap]]
 
* [[Isomap]]
 
* [[Softmax]]
 
* [[Softmax]]
 +
* [http://files.knime.com/sites/default/files/inline-images/knime_seventechniquesdatadimreduction.pdf Seven Techniques for Dimensionality Reduction | KNIME]
 
* [http://github.com/JonTupitza/Data-Science-Process/blob/master/06-Dimensionality-Reduction.ipynb Dimensionality Reduction Techniques Jupyter Notebook] | [http://github.com/jontupitza Jon Tupitza]
 
* [http://github.com/JonTupitza/Data-Science-Process/blob/master/06-Dimensionality-Reduction.ipynb Dimensionality Reduction Techniques Jupyter Notebook] | [http://github.com/jontupitza Jon Tupitza]
* [[Local Linear Embedding (LLE) | Embedding functions]]
 
 
* [[(Deep) Convolutional Neural Network (DCNN/CNN)]]
 
* [[(Deep) Convolutional Neural Network (DCNN/CNN)]]
 
* [http://en.wikipedia.org/wiki/Factor_analysis Factor analysis]
 
* [http://en.wikipedia.org/wiki/Factor_analysis Factor analysis]
 
* [http://en.wikipedia.org/wiki/Feature_extraction Feature extraction]
 
* [http://en.wikipedia.org/wiki/Feature_extraction Feature extraction]
 
* [http://en.wikipedia.org/wiki/Feature_selection Feature selection]
 
* [http://en.wikipedia.org/wiki/Feature_selection Feature selection]
* [http://files.knime.com/sites/default/files/inline-images/knime_seventechniquesdatadimreduction.pdf Seven Techniques for Dimensionality Reduction | KNIME]
 
 
* [http://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduction#Locally-linear_embedding Nonlinear dimensionality reduction | Wikipedia]
 
* [http://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduction#Locally-linear_embedding Nonlinear dimensionality reduction | Wikipedia]
 +
* [[Local Linear Embedding (LLE) | Embedding functions]]
  
 
To identify the most important [[Feature Exploration/Learning | Features]] to address:
 
To identify the most important [[Feature Exploration/Learning | Features]] to address:
Line 29: Line 29:
  
 
* Algorithms:
 
* Algorithms:
** [[Principal Component Analysis (PCA)]]
+
** [[Principal Component Analysis (PCA)]] is an unsupervised linear transformation technique helps us identify patterns in data based of the correlation between the features. PCA aims to find the directions of the maximum variance in high dimensional data and project it onto a lower dimensional feature space.
 
** [http://en.wikipedia.org/wiki/Independent_component_analysis Independent Component Analysis (ICA)]
 
** [http://en.wikipedia.org/wiki/Independent_component_analysis Independent Component Analysis (ICA)]
 
** [http://en.wikipedia.org/wiki/Canonical_correlation Canonical Correlation Analysis (CCA)]
 
** [http://en.wikipedia.org/wiki/Canonical_correlation Canonical Correlation Analysis (CCA)]
** [http://en.wikipedia.org/wiki/Linear_discriminant_analysis Linear Discriminant Analysis (LDA)]
+
** [http://en.wikipedia.org/wiki/Linear_discriminant_analysis Linear Discriminant Analysis (LDA)] is a supervised linear transformation technique is to find the feature subspace that optimizes class separability.
 
** [http://en.wikipedia.org/wiki/Multidimensional_scaling Multidimensional Scaling (MDS)]
 
** [http://en.wikipedia.org/wiki/Multidimensional_scaling Multidimensional Scaling (MDS)]
 
** [http://en.wikipedia.org/wiki/Non-negative_matrix_factorization Non-Negative Matrix Factorization (NMF)]
 
** [http://en.wikipedia.org/wiki/Non-negative_matrix_factorization Non-Negative Matrix Factorization (NMF)]
Line 41: Line 41:
 
** [[Local Linear Embedding (LLE)]]
 
** [[Local Linear Embedding (LLE)]]
 
** [[T-Distributed Stochastic Neighbor Embedding (t-SNE)]]  ...similar objects are modeled by nearby points
 
** [[T-Distributed Stochastic Neighbor Embedding (t-SNE)]]  ...similar objects are modeled by nearby points
** [http://arxiv.org/pdf/1802.03426.pdf Uniform Manifold Approximation and Projection (UMAP) | L. McInnes, J. Healy, and J. Melville] ... a dimension reduction technique that can be used for visualisation similarly to [[T-Distributed Stochastic Neighbor Embedding (t-SNE) | t-SNE]], but also for general non-linear dimension reduction.
+
 
*** [http://github.com/lmcinnes/umap UMAP]...[[Python]] version
 
*** [http://github.com/pair-code/umap-js UMAP-JS] ...[[Javascript]] version
 
  
  
 
Some datasets may contain many variables that may cause very hard to handle. Especially nowadays data collecting in systems occur at very detailed level due to the existence of more than enough resources. In such cases, the data sets may contain thousands of variables and most of them can be unnecessary as well. In this case, it is almost impossible to identify the variables which have the most impact on our prediction. Dimensional Reduction Algorithms are used in this kind of situations. It utilizes other algorithms like Random Forest, Decision Tree to identify the most important variables. [http://towardsdatascience.com/10-machine-learning-algorithms-you-need-to-know-77fb0055fe0 10 Machine Learning Algorithms You need to Know | Sidath Asir @ Medium]
 
Some datasets may contain many variables that may cause very hard to handle. Especially nowadays data collecting in systems occur at very detailed level due to the existence of more than enough resources. In such cases, the data sets may contain thousands of variables and most of them can be unnecessary as well. In this case, it is almost impossible to identify the variables which have the most impact on our prediction. Dimensional Reduction Algorithms are used in this kind of situations. It utilizes other algorithms like Random Forest, Decision Tree to identify the most important variables. [http://towardsdatascience.com/10-machine-learning-algorithms-you-need-to-know-77fb0055fe0 10 Machine Learning Algorithms You need to Know | Sidath Asir @ Medium]
  
 
+
<youtube>YPJQydzTLwQ</youtube>
 
<youtube>YPJQydzTLwQ</youtube>
 
<youtube>YPJQydzTLwQ</youtube>
 
<youtube>4NlvatkpV3s</youtube>
 
<youtube>4NlvatkpV3s</youtube>
Line 65: Line 63:
 
* [[Privacy]]
 
* [[Privacy]]
 
* [[Manifold Hypothesis]]
 
* [[Manifold Hypothesis]]
 +
** [http://arxiv.org/pdf/1802.03426.pdf Uniform Manifold Approximation and Projection (UMAP) | L. McInnes, J. Healy, and J. Melville] ... a dimension reduction technique that can be used for visualisation similarly to [[T-Distributed Stochastic Neighbor Embedding (t-SNE) | t-SNE]], but also for general non-linear dimension reduction.
 +
*** [http://github.com/lmcinnes/umap UMAP]...[[Python]] version
 +
*** [http://github.com/pair-code/umap-js UMAP-JS] ...[[Javascript]] version
  
 
<youtube>6BPl81wGGP8</youtube>
 
<youtube>6BPl81wGGP8</youtube>

Revision as of 05:57, 26 August 2021

Youtube search... ...Google search

To identify the most important Features to address:

  • reduce the amount of computing resources required
  • 2D & 3D intuition often fails in higher dimensions
  • distances tend to become relatively the 'same' as the number of dimensions increases


Some datasets may contain many variables that may cause very hard to handle. Especially nowadays data collecting in systems occur at very detailed level due to the existence of more than enough resources. In such cases, the data sets may contain thousands of variables and most of them can be unnecessary as well. In this case, it is almost impossible to identify the variables which have the most impact on our prediction. Dimensional Reduction Algorithms are used in this kind of situations. It utilizes other algorithms like Random Forest, Decision Tree to identify the most important variables. 10 Machine Learning Algorithms You need to Know | Sidath Asir @ Medium


Projection

Youtube search... ...Google search