Difference between revisions of "Dimensional Reduction"

From
Jump to: navigation, search
m
m
Line 15: Line 15:
 
* [http://github.com/JonTupitza/Data-Science-Process/blob/master/06-Dimensionality-Reduction.ipynb Dimensionality Reduction Techniques Jupyter Notebook] | [http://github.com/jontupitza Jon Tupitza]
 
* [http://github.com/JonTupitza/Data-Science-Process/blob/master/06-Dimensionality-Reduction.ipynb Dimensionality Reduction Techniques Jupyter Notebook] | [http://github.com/jontupitza Jon Tupitza]
 
* [[Local Linear Embedding (LLE) | Embedding functions]]
 
* [[Local Linear Embedding (LLE) | Embedding functions]]
 +
* [[(Deep) Convolutional Neural Network (DCNN/CNN)]]
 +
* [http://en.wikipedia.org/wiki/Factor_analysis Factor analysis]
 +
* [http://en.wikipedia.org/wiki/Feature_extraction Feature extraction]
 +
* [http://en.wikipedia.org/wiki/Feature_selection Feature selection]
 +
* [http://files.knime.com/sites/default/files/inline-images/knime_seventechniquesdatadimreduction.pdf Seven Techniques for Dimensionality Reduction | KNIME]
 +
* [http://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduction#Locally-linear_embedding Nonlinear dimensionality reduction | Wikipedia]
  
 
To identify the most important [[Feature Exploration/Learning | Features]] to address:
 
To identify the most important [[Feature Exploration/Learning | Features]] to address:
Line 39: Line 45:
 
*** [http://github.com/pair-code/umap-js UMAP-JS] ...[[Javascript]] version
 
*** [http://github.com/pair-code/umap-js UMAP-JS] ...[[Javascript]] version
  
Related:
 
* [[(Deep) Convolutional Neural Network (DCNN/CNN)]]
 
* [http://en.wikipedia.org/wiki/Factor_analysis Factor analysis]
 
* [http://en.wikipedia.org/wiki/Feature_extraction Feature extraction]
 
* [http://en.wikipedia.org/wiki/Feature_selection Feature selection]
 
* [http://files.knime.com/sites/default/files/inline-images/knime_seventechniquesdatadimreduction.pdf Seven Techniques for Dimensionality Reduction | KNIME]
 
* [http://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduction#Locally-linear_embedding Nonlinear dimensionality reduction | Wikipedia]
 
  
 
Some datasets may contain many variables that may cause very hard to handle. Especially nowadays data collecting in systems occur at very detailed level due to the existence of more than enough resources. In such cases, the data sets may contain thousands of variables and most of them can be unnecessary as well. In this case, it is almost impossible to identify the variables which have the most impact on our prediction. Dimensional Reduction Algorithms are used in this kind of situations. It utilizes other algorithms like Random Forest, Decision Tree to identify the most important variables. [http://towardsdatascience.com/10-machine-learning-algorithms-you-need-to-know-77fb0055fe0 10 Machine Learning Algorithms You need to Know | Sidath Asir @ Medium]
 
Some datasets may contain many variables that may cause very hard to handle. Especially nowadays data collecting in systems occur at very detailed level due to the existence of more than enough resources. In such cases, the data sets may contain thousands of variables and most of them can be unnecessary as well. In this case, it is almost impossible to identify the variables which have the most impact on our prediction. Dimensional Reduction Algorithms are used in this kind of situations. It utilizes other algorithms like Random Forest, Decision Tree to identify the most important variables. [http://towardsdatascience.com/10-machine-learning-algorithms-you-need-to-know-77fb0055fe0 10 Machine Learning Algorithms You need to Know | Sidath Asir @ Medium]

Revision as of 08:43, 3 September 2020

Youtube search... ...Google search

To identify the most important Features to address:

  • reduce the amount of computing resources required
  • 2D & 3D intuition often fails in higher dimensions
  • distances tend to become relatively the 'same' as the number of dimensions increases


Some datasets may contain many variables that may cause very hard to handle. Especially nowadays data collecting in systems occur at very detailed level due to the existence of more than enough resources. In such cases, the data sets may contain thousands of variables and most of them can be unnecessary as well. In this case, it is almost impossible to identify the variables which have the most impact on our prediction. Dimensional Reduction Algorithms are used in this kind of situations. It utilizes other algorithms like Random Forest, Decision Tree to identify the most important variables. 10 Machine Learning Algorithms You need to Know | Sidath Asir @ Medium