Difference between revisions of "...cluster"

From
Jump to: navigation, search
m
 
(17 intermediate revisions by the same user not shown)
Line 1: Line 1:
* [[AI Solver]]  
+
[[AI Solver]]  
* [[Capabilities]]
+
*...cluster
 +
** If text only, then try...
 +
*** for text categories try [[K-Modes]]
 +
*** for discovering semantics and syntax try [[Natural Language Processing (NLP)]] algorithms such as [[Topic Model/Mapping]]
 +
** If finding transaction data relationships, then try [[Association Rule Learning]]
 +
** If you know how many groups/classes there are...
 +
*** Yes
 +
**** ...using numeric values to find categories, then try the [[K-Means]] algorithm
 +
**** ...need to be less sensitive to data scaling, then try [[Mixture Models; Gaussian]]
 +
*** No
 +
**** ...size of the clusters, then try the [[Mean-Shift Clustering]] algorithm
 +
**** ...size of the clusters may vary, then try the [[Density-Based Spatial Clustering of Applications with Noise (DBSCAN)]]
 +
**** ...clusters aren't necessarily circular, and points are allowed to be in overlapping clusters, then try [[Expectation–Maximization (EM) Clustering using Gaussian Mixture Models (GMM)]]
 +
**** ...the distance metric shouldn't be key, then try [[Hierarchical Clustering;  Agglomerative (HAC) & Divisive (HDC)]]
 +
**** ...finding categorical values, then [[K-Modes]] clustering
  
Given a set of data points, we can use a clustering algorithm to classify each data point into a specific group. In theory, data points that are in the same group should have similar properties and/or features, while data points in different groups should have highly dissimilar properties and/or features. Clustering is a method of unsupervised learning and is a common technique for statistical data analysis used in many fields. We can use clustering analysis to gain some valuable insights from our data by seeing what groups the data points fall into when we apply a clustering algorithm. [http://towardsdatascience.com/the-5-clustering-algorithms-data-scientists-need-to-know-a36d136ef68 The 5 Clustering Algorithms Data Scientists Need to Know]
 
  
If you know how many groups/classes there are...
+
___________________________________________________
  
* ...yes, then try the [[K-Means]] algorithm
+
* [[Clustering]]
* [[...no, I do not know the amount of groups/classes]]
+
* [https://www.clusteranalysis4marketing.com Quick Cluster Analysis for Excel]
 +
** [[Excel]] ... [[LangChain#Documents|Documents]] ... [[Database|Database; Vector & Relational]] ... [[Graph]] ... [[LlamaIndex]]
 +
* [https://www.kdnuggets.com/2019/10/right-clustering-algorithm.html Choosing the Right Clustering Algorithm for your Dataset | josh Thompson - KDnuggets]
 +
* [https://www.kdnuggets.com/2018/06/5-clustering-algorithms-data-scientists-need-know.html The 5 Clustering Algorithms Data Scientists Need to Know | George Seif - KDnuggets]
 +
* [https://medium.com/@srnghn/machine-learning-trying-to-discover-structure-in-your-data-2fbbc4f819ae Machine Learning: Trying to discover structure in your data | Stacey Ronaghan - Medium]
 +
 +
 
 +
Given a set of data points, we can use a clustering algorithm to classify each data point into a specific group. In theory, data points that are in the same group should have similar properties and/or features, while data points in different groups should have highly dissimilar properties and/or features. Clustering is a method of unsupervised learning and is a common technique for statistical data analysis used in many fields. We can use clustering analysis to gain some valuable insights from our data by seeing what groups the data points fall into when we apply a clustering algorithm. [https://towardsdatascience.com/the-5-clustering-algorithms-data-scientists-need-to-know-a36d136ef68 The 5 Clustering Algorithms Data Scientists Need to Know]
 +
 
 +
 
 +
=== [[K-Means]] Clustering ===
 +
https://cdn-images-1.medium.com/max/800/1*KrcZK0xYgTa4qFrVr0fO2w.gif
 +
 
 +
 
 +
=== [[Mean-Shift Clustering]] ===
 +
https://cdn-images-1.medium.com/max/800/1*vyz94J_76dsVToaa4VG1Zg.gif
 +
 
 +
 
 +
=== [[Density-Based Spatial Clustering of Applications with Noise (DBSCAN)]] ===
 +
https://cdn-images-1.medium.com/max/800/1*tc8UF-h0nQqUfLC8-0uInQ.gif
 +
 
 +
 
 +
=== [[Expectation–Maximization (EM) Clustering using Gaussian Mixture Models (GMM)]] ===
 +
https://cdn-images-1.medium.com/max/800/1*OyXgise21a23D5JCss8Tlg.gif
 +
 
 +
 
 +
=== [[Hierarchical Clustering; Agglomerative (HAC) & Divisive (HDC)]] ===
 +
https://cdn-images-1.medium.com/max/800/1*ET8kCcPpr893vNZFs8j4xg.gif
 +
 
 +
 
 +
<youtube>Yn3VV9emiCs</youtube>

Latest revision as of 06:38, 17 August 2023

AI Solver


___________________________________________________


Given a set of data points, we can use a clustering algorithm to classify each data point into a specific group. In theory, data points that are in the same group should have similar properties and/or features, while data points in different groups should have highly dissimilar properties and/or features. Clustering is a method of unsupervised learning and is a common technique for statistical data analysis used in many fields. We can use clustering analysis to gain some valuable insights from our data by seeing what groups the data points fall into when we apply a clustering algorithm. The 5 Clustering Algorithms Data Scientists Need to Know


K-Means Clustering

1*KrcZK0xYgTa4qFrVr0fO2w.gif


Mean-Shift Clustering

1*vyz94J_76dsVToaa4VG1Zg.gif


Density-Based Spatial Clustering of Applications with Noise (DBSCAN)

1*tc8UF-h0nQqUfLC8-0uInQ.gif


Expectation–Maximization (EM) Clustering using Gaussian Mixture Models (GMM)

1*OyXgise21a23D5JCss8Tlg.gif


Hierarchical Clustering; Agglomerative (HAC) & Divisive (HDC)

1*ET8kCcPpr893vNZFs8j4xg.gif