Difference between revisions of "Hierarchical Clustering; Agglomerative (HAC) & Divisive (HDC)"

From
Jump to: navigation, search
(Created page with "[http://www.youtube.com/results?search_query=Hierarchical+Agglomerative+Clustering+HAC Youtube search...] * AI Solver * ...cluster * ...no, I do not know the amoun...")
 
Line 5: Line 5:
 
* [[...no, I do not know the amount of groups/classes]]
 
* [[...no, I do not know the amount of groups/classes]]
  
Hierarchical clustering algorithms actually fall into 2 categories: top-down or bottom-up. Bottom-up algorithms treat each data point as a single cluster at the outset and then successively merge (or agglomerate) pairs of clusters until all clusters have been merged into a single cluster that contains all data points. Bottom-up hierarchical clustering is therefore called hierarchical agglomerative clustering or HAC. This hierarchy of clusters is represented as a tree (or dendrogram). The root of the tree is the unique cluster that gathers all the samples, the leaves being the clusters with only one sample. [http://towardsdatascience.com/the-5-clustering-algorithms-data-scientists-need-to-know-a36d136ef68 The 5 Clustering Algorithms Data Scientists Need to Know | Towards Data Science]
+
Hierarchical clustering algorithms actually fall into 2 categories: (1) Agglomerative; bottom-up or (2) Divisive; top-down  
 +
 
 +
<youtube>2z5wwyv0Zk4</youtube>
 +
<youtube>d1teghNuOu8</youtube>
 +
 
 +
== Agglomerative Clustering - Bottom Up ==
 +
Bottom-up algorithms treat each data point as a single cluster at the outset and then successively merge (or agglomerate) pairs of clusters until all clusters have been merged into a single cluster that contains all data points. Bottom-up hierarchical clustering is therefore called hierarchical agglomerative clustering or HAC. This hierarchy of clusters is represented as a tree (or dendrogram). The root of the tree is the unique cluster that gathers all the samples, the leaves being the clusters with only one sample. [http://towardsdatascience.com/the-5-clustering-algorithms-data-scientists-need-to-know-a36d136ef68 The 5 Clustering Algorithms Data Scientists Need to Know | Towards Data Science]
  
 
Hierarchical clustering does not require us to specify the number of clusters and we can even select which number of clusters looks best since we are building a tree. Additionally, the algorithm is not sensitive to the choice of distance metric; all of them tend to work equally well whereas with other clustering algorithms, the choice of distance metric is critical. A particularly good use case of hierarchical clustering methods is when the underlying data has a hierarchical structure and you want to recover the hierarchy; other clustering algorithms can’t do this. These advantages of hierarchical clustering come at the cost of lower efficiency, as it has a time complexity of O(n³), unlike the linear complexity of K-Means and GMM.
 
Hierarchical clustering does not require us to specify the number of clusters and we can even select which number of clusters looks best since we are building a tree. Additionally, the algorithm is not sensitive to the choice of distance metric; all of them tend to work equally well whereas with other clustering algorithms, the choice of distance metric is critical. A particularly good use case of hierarchical clustering methods is when the underlying data has a hierarchical structure and you want to recover the hierarchy; other clustering algorithms can’t do this. These advantages of hierarchical clustering come at the cost of lower efficiency, as it has a time complexity of O(n³), unlike the linear complexity of K-Means and GMM.
Line 11: Line 17:
 
https://cdn-images-1.medium.com/max/640/1*ET8kCcPpr893vNZFs8j4xg.gif
 
https://cdn-images-1.medium.com/max/640/1*ET8kCcPpr893vNZFs8j4xg.gif
  
<youtube>JNlEIEwe-Cg</youtube>
+
<youtube>OcoE7JlbXvY</youtube>
<youtube>XRNmOxKo8oI</youtube>
+
<youtube>XJ3194AmH40</youtube>
 +
<youtube>EUQY3hL38cw</youtube>
 +
<youtube>7xHsRkOdVwo</youtube>
 +
 
 +
== Divisive Clustering  = Top Down ==
 +
 
 +
<youtube>MIWVfCcHzM4</youtube>
 +
<youtube>Fm01pqWLqzU</youtube>

Revision as of 22:03, 30 May 2018

Youtube search...

Hierarchical clustering algorithms actually fall into 2 categories: (1) Agglomerative; bottom-up or (2) Divisive; top-down

Agglomerative Clustering - Bottom Up

Bottom-up algorithms treat each data point as a single cluster at the outset and then successively merge (or agglomerate) pairs of clusters until all clusters have been merged into a single cluster that contains all data points. Bottom-up hierarchical clustering is therefore called hierarchical agglomerative clustering or HAC. This hierarchy of clusters is represented as a tree (or dendrogram). The root of the tree is the unique cluster that gathers all the samples, the leaves being the clusters with only one sample. The 5 Clustering Algorithms Data Scientists Need to Know | Towards Data Science

Hierarchical clustering does not require us to specify the number of clusters and we can even select which number of clusters looks best since we are building a tree. Additionally, the algorithm is not sensitive to the choice of distance metric; all of them tend to work equally well whereas with other clustering algorithms, the choice of distance metric is critical. A particularly good use case of hierarchical clustering methods is when the underlying data has a hierarchical structure and you want to recover the hierarchy; other clustering algorithms can’t do this. These advantages of hierarchical clustering come at the cost of lower efficiency, as it has a time complexity of O(n³), unlike the linear complexity of K-Means and GMM.

1*ET8kCcPpr893vNZFs8j4xg.gif

Divisive Clustering = Top Down