- Embedding ... Fine-tuning ... RAG ... Search ... Clustering ... Recommendation ... Anomaly Detection ... Classification ... Dimensional Reduction. ...find outliers
- Math for Intelligence ... Finding Paul Revere ... Social Network Analysis (SNA) ... Dot Product ... Kernel Trick
- Hyperdimensional Computing (HDC)
- Pooling / Sub-sampling: Max, Mean
- Backpropagation ... FFNN ... Forward-Forward ... Activation Functions ...Softmax ... Loss ... Boosting ... Gradient Descent ... Hyperparameter ... Manifold Hypothesis ... PCA
- Seven Techniques for Dimensionality Reduction | KNIME
- Dimensionality Reduction Techniques Jupyter Notebook | Jon Tupitza
- (Deep) Convolutional Neural Network (DCNN/CNN)
- Factor analysis
- Feature extraction
- Feature selection
- Nonlinear dimensionality reduction | Wikipedia
To identify the most important Features to address:
- reduce the amount of computing resources required
- 2D & 3D intuition often fails in higher dimensions
- distances tend to become relatively the 'same' as the number of dimensions increases
- Principal Component Analysis (PCA) is an unsupervised linear transformation technique helps us identify patterns in data based of the correlation between the features. PCA aims to find the directions of the maximum variance in high dimensional data and project it onto a lower dimensional feature space.
- Independent Component Analysis (ICA)
- Canonical Correlation Analysis (CCA)
- Linear Discriminant Analysis (LDA) is a supervised linear transformation technique is to find the feature subspace that optimizes class separability.
- Multidimensional Scaling (MDS)
- Non-Negative Matrix Factorization (NMF)
- Partial Least Squares Regression (PLSR)
- Principal Component Regression (PCR)
- Projection Pursuit
- Sammon Mapping/Projection
- Local Linear Embedding (LLE) creates an embedding of the dataset and tries to preserve the relationships between neighborhoods in the dataset. LLE can be thought of as a series of local PCAs that are globally compared to find the best non-linear embedding.
- Isomap Embedding is a non-linear dimensionality reduction technique that creates an embedding of the dataset and tries to preserve the relationships in the dataset. Isomap looks for a lower-dimensional embedding which maintains distances between all points.
- T-Distributed Stochastic Neighbor Embedding (t-SNE) ...similar objects are modeled by nearby points
- Singular Value Decomposition (SVD) is a linear dimensionality reduction technique.
Dimensional Reduction techniques for reducing the number of input variables in training data - captures the “essence” of the data
Some datasets may contain many variables that may cause very hard to handle. Especially nowadays data collecting in systems occur at very detailed level due to the existence of more than enough resources. In such cases, the data sets may contain thousands of variables and most of them can be unnecessary as well. In this case, it is almost impossible to identify the variables which have the most impact on our prediction. Dimensional Reduction Algorithms are used in this kind of situations. It utilizes other algorithms like Random Forest, Decision Tree to identify the most important variables. 10 Machine Learning Algorithms You need to Know | Sidath Asir @ Medium
- Autoencoder (AE) / Encoder-Decoder
- Manifold Hypothesis
- Uncovering High-dimensional Structures of Projections from Dimensionality Reduction Methods | Michael Thrun & Alfred Ultsch - ScienceDirect
Product Quantization (PQ)
Product quantization (PQ) is a technique used for vector compression and is very effective in compressing high-dimensional vectors for nearest neighbor search. The idea behind PQ is to decompose the space into a Cartesian product of low-dimensional subspaces and to quantize each subspace separately. This technique is used in Approximate Nearest Neighbor Search (ANNs) and is a vital part of many vector quantization techniques.
Here are some key points about product quantization:
- PQ divides and splits vectors into segments and quantizes each segment of the vectors separately.
- Each vector in the database is converted to a short code, known as a PQ code, which is a representation that is extremely memory-efficient for the approximate nearest neighbor search.
- PQ methods decompose the embedding manifold into a Cartesian product of M disjoint partitions and quantize each partition into K clusters.
- PQ is highly scalable and can be used for large-scale searches.
- PQ is used in many vector search libraries, including Faiss, which contains different index types, including one with product quantization (IVF-PQ).