Difference between revisions of "...cluster"

Revision as of 19:05, 7 January 2019

Given a set of data points, we can use a clustering algorithm to classify each data point into a specific group. In theory, data points that are in the same group should have similar properties and/or features, while data points in different groups should have highly dissimilar properties and/or features. Clustering is a method of unsupervised learning and is a common technique for statistical data analysis used in many fields. We can use clustering analysis to gain some valuable insights from our data by seeing what groups the data points fall into when we apply a clustering algorithm. The 5 Clustering Algorithms Data Scientists Need to Know

___________________________________________________

If you know how many groups/classes there are...

Yes
- then try the K-Means algorithm
No
- ...size of the clusters, then try the Mean-Shift Clustering algorithm
- ...size of the clusters may vary, then try the Density-Based Spatial Clustering of Applications with Noise (DBSCAN)
- ...clusters aren't necessarily circular, and points are allowed to be in overlapping clusters, then try Expectation–Maximization (EM) Clustering using Gaussian Mixture Models (GMM)
- ...the distance metric shouldn't be key, then try Hierarchical Clustering; Agglomerative (HAC) & Divisive (HDC)

___________________________________________________

@@ Line 9: / Line 9: @@
 If you know how many groups/classes there are...
-* ...yes, then try the [[K-Means]] algorithm
+* Yes
-* [[...no, I do not know the amount of groups/classes]]
+** then try the [[K-Means]] algorithm
+* No
+** ...size of the clusters, then try the [[Mean-Shift Clustering]] algorithm
+** ...size of the clusters may vary, then try the [[Density-Based Spatial Clustering of Applications with Noise (DBSCAN)]]
+** ...clusters aren't necessarily circular, and points are allowed to be in overlapping clusters, then try [[Expectation–Maximization (EM) Clustering using Gaussian Mixture Models (GMM)]]
+** ...the distance metric shouldn't be key, then try [[Hierarchical Clustering;  Agglomerative (HAC) & Divisive (HDC)]]
 ___________________________________________________
 <youtube>Yn3VV9emiCs</youtube>

Difference between revisions of "...cluster"

Revision as of 19:05, 7 January 2019

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools