Difference between revisions of "...cluster"

From
Jump to: navigation, search
Line 6: Line 6:
  
 
___________________________________________________
 
___________________________________________________
 +
 +
If text only, then try [[Natural Language Processing (NLP)]] algorithms such as [[Topic Model/Mapping]]
 +
 +
If finding transaction data relationships, then try [[Association Rule Learning]]
  
 
If you know how many groups/classes there are...
 
If you know how many groups/classes there are...
 
 
* Yes
 
* Yes
** then try the [[K-Means]] algorithm
+
** ...using numeric values to find categories, then try the [[K-Means]] algorithm
 +
** ...[[Gaussian Mixture]]
 
* No
 
* No
 
** ...size of the clusters, then try the [[Mean-Shift Clustering]] algorithm
 
** ...size of the clusters, then try the [[Mean-Shift Clustering]] algorithm
Line 16: Line 20:
 
** ...clusters aren't necessarily circular, and points are allowed to be in overlapping clusters, then try [[Expectation–Maximization (EM) Clustering using Gaussian Mixture Models (GMM)]]
 
** ...clusters aren't necessarily circular, and points are allowed to be in overlapping clusters, then try [[Expectation–Maximization (EM) Clustering using Gaussian Mixture Models (GMM)]]
 
** ...the distance metric shouldn't be key, then try [[Hierarchical Clustering;  Agglomerative (HAC) & Divisive (HDC)]]
 
** ...the distance metric shouldn't be key, then try [[Hierarchical Clustering;  Agglomerative (HAC) & Divisive (HDC)]]
 
+
** ...finding categorical values, then [[K-modes Clustering]]
  
 
___________________________________________________
 
___________________________________________________
  
 
<youtube>Yn3VV9emiCs</youtube>
 
<youtube>Yn3VV9emiCs</youtube>

Revision as of 23:19, 7 January 2019

Given a set of data points, we can use a clustering algorithm to classify each data point into a specific group. In theory, data points that are in the same group should have similar properties and/or features, while data points in different groups should have highly dissimilar properties and/or features. Clustering is a method of unsupervised learning and is a common technique for statistical data analysis used in many fields. We can use clustering analysis to gain some valuable insights from our data by seeing what groups the data points fall into when we apply a clustering algorithm. The 5 Clustering Algorithms Data Scientists Need to Know

___________________________________________________

If text only, then try Natural Language Processing (NLP) algorithms such as Topic Model/Mapping

If finding transaction data relationships, then try Association Rule Learning

If you know how many groups/classes there are...

___________________________________________________