Clustering
YouTube ... Quora ...Google search ...Google News ...Bing News
- ...cluster - AI Solver
- Embedding: Search ... Clustering ... Recommendation ... Anomaly Detection ... Classification ... Dimensional Reduction ... ...find outliers
- Singular Value Decomposition (SVD)
- Principal Component Analysis (PCA)
- K-Means
- Fuzzy C-Means (FCM)
- K-Modes
- Association Rule Learning
- Mean-Shift Clustering
- Density-Based Spatial Clustering of Applications with Noise (DBSCAN)
- Expectation–Maximization (EM) Clustering using Gaussian Mixture Models (GMM)
- Restricted Boltzmann Machine (RBM)
- Variational Autoencoder (VAE)
- Biclustering
- Multidimensional Scaling (MDS)
- Hierarchical; to include clustering
Similarity Measures for Clusters:
- Compare the numbers of identical and unique item pairs appearing in cluster sets
- Achieved by counting the number of item pairs found in both clustering sets (a) as well as the pairs appearing only in the first (b) or the second (c) set.
- With this a similarity coefficient, such as the Jaccard index, can be computed. The latter is defined as the size of the intersect divided by the size of the union of two sample sets: a/(a+b+c).
- In case of partitioning results, the Jaccard Index measures how frequently pairs of items are joined together in two clustering data sets and how often pairs are observed only in one set.
- Related coefficient are the Rand Index and the Adjusted Rand Index. These indices also consider the number of pairs (d) that are not joined together in any of the clusters in both sets
Clustering Algorithms | Data Analysis in Genome Biology
OPTICS: ordering points to identify the clustering structure
Cluster analysis is a primary method for database mining. It is either used as a stand-alone tool to get insight into the distribution of a data set, e.g. to focus further analysis and data processing, or asa preprocessing step for other algorithms operating on the detected clusters. Almost all of the well-known clustering algorithms require input parameters which are hard to determine but have a significant influence on the clustering result. Furthermore, for many real-datasets there does not even exist a global parameter setting for which the result of the clustering algorithm describes the intrinsic clustering structure accurately. We introduce a new algorithm for the pur-pose of cluster analysis which does not produce a clustering of a data set explicitly; but instead creates an augmented ordering of the database representing its density-based clustering structure. This cluster-ordering contains information which is equivalent to the density-based clusterings corresponding to a broad range of parameter settings. It is a versatile basis for both automatic and interactive cluster analysis. We show how to automatically and efficientlyextract not only ‘traditional’ clustering information (e.g. representa-tive points, arbitrary shaped clusters), but also the intrinsic cluster-ing structure. For medium sized data sets, the cluster-ordering canbe represented graphically and for very large data sets, we introducean appropriate visualization technique. Both are suitable for inter-active exploration of the intrinsic clustering structure offering additional insights into the distribution and correlation of the data.
In this paper, we proposed a cluster analysis method based on the OPTICS algorithm. OPTICS computes an augmented cluster-ordering of the database objects. The main advantage of ourapproach, when compared to the clustering algorithms pro-posed in the literature, is that we do not limit ourselves to oneglobal parameter setting. Instead, the augmented cluster-order-ing contains information which is equivalent to the density-based clusterings corresponding to a broad range of parameter settings and thus is a versatile basis for both automatic and interactive cluster analysis. We demonstrated how to use it as a standalone tool to get in-sight into the distribution of a data set. Depending on the size of the database, we either represent the cluster-ordering graphically (for small data sets) or use an appropriate visualization technique (for large data sets). Both techniques are suitable for interactively exploring the clustering structure, offering additional insights into the distribution and correlation of the data. We also presented an efficient and effective algorithm to auto-matically extract not only ‘traditional’ clustering information but also the intrinsic, hierarchical clustering structure. There are several opportunities forfuture research. For very high-dimensional spaces, no index structures exist to efficiently support the hypersphere range queries needed by the OPTICS algorithm. Therefore it is infeasible to apply it in its current form to a database containing several million high-dimensional objects. Consequently, the most interesting question is whether we can modify OPTICS so that we can trade-off a limited amount of accuracy for a large gain in efficiency. Incrementally managing a cluster-ordering when updates on the database oc-cur is another interesting challenge. Although there are techniques to update a ‘flat’ density-based decomposition[EKS+ 98] incrementally, it is not obvious how to extend theseideas to a density-based cluster-ordering of a data se