Algorithms such as Hierarchical Clustering belong to unsupervised machine learning and are used for clustering data objects into cluster. However, it is not necessitated to mention the number of clusters a prior just like K-means clustering; rather it builds hierarchical groups. Therefore, one can observe the result by plotting it as a dendrogram. This method helps to use in case confidence one about number of clusters or along with it need hustle for the data set to cluster.
Types of Hierarchical Clustering
Hierarchical clustering can be of two types:
- Agglomerative (bottom-up approach) Clustering
- Divisive (top-down approach) Clustering
Dendrogram
Dendrogram: A tree-like diagram showing the structure of all clusters generated by a hierarchical clustering algorithm. The dendrogram connects data points using U-shaped lines in either a tree or hierarchical form. The height of each U shows how far apart the two points or clusters it connects are from one another.
Agglomerative clustering
Agglomerative clustering is also known as a bottom-up clustering hierarchical algorithm. It starts with each data point as a separate cluster and iteratively merges the closest clusters until all points belong to a single cluster.
Algorithm of Agglomerative Clustering
a. First, start with N clusters, where N is the number of data points.
Diagram 1
b. Next, merge the two closest clusters.
Diagram 2 Diagram 3 Diagram 4 Diagram 5
c. Then, compute distances for each pair of clusters.
d. Finally, repeat steps b and c until only one cluster remains or a stopping criterion is met.
Implementation in Python
# Importing Libraries from sklearn.cluster import AgglomerativeClustering from scipy.cluster.hierarchy import dendrogram, linkage import matplotlib.pyplot as plt import numpy as np # Sample data X = np.array([[1, 2], [1.5, 1.8], [5, 8], [8, 8], [1, 0.6], [9, 11]]) # Create clustering model clustering = AgglomerativeClustering(n_clusters=2) # Fit the model and get cluster labels labels = clustering.fit_predict(X) # Compute the linkage matrix linkage_matrix = linkage(X, method='ward') # Plot the dendrogram dendrogram(linkage_matrix) plt.title('Dendrogram') plt.xlabel('Sample Index') plt.ylabel('Distance') plt.show()
Advantages:
- No need to specify the number of clusters in advance
- Can capture clusters of different shapes
- Produces a hierarchical representation (dendrogram)
- Works well with small to medium-sized datasets
Disadvantages:
- Computationally expensive for large datasets (O(n^3) time complexity)
- Cannot undo previous steps (merges)
- Sensitive to noise and outliers
Applications:
- Taxonomy creation in biology
- Document clustering in information retrieval
- Customer segmentation in marketing
- Anomaly detection in various fields
Divisive Clustering
Divisive Clustering (Top-down) Divisiveness clustering is the less used against agglomerative in hierarchical methods of partitional algorithms. Initially, all the data points belong to a single cluster and then the algorithm splits this large into smaller sub-clusters iteratively by ensuring each data point lies in its own cluster or meets some stopping criterion.
Algorithm of Divisive Clustering
a. Begin with all points in a single cluster.
b. First, select a cluster to split.
c. Then, use a flat clustering algorithm (like K-means) to split the chosen cluster into two subclusters.
d. Afterward, repeat steps b and c until each cluster contains only one data point or a stopping criterion is met.
Implementation in Python
import numpy as np from sklearn.cluster import KMeans def div_cluster(X, min_cluster_size=2): clusters = [X] while True: cluster_to_split = max(clusters, key=len) if len(cluster_to_split) <= min_cluster_size: break kmeans = KMeans(n_clusters=2) labels = kmeans.fit_predict(cluster_to_split) clusters.remove(cluster_to_split) clusters.extend([ cluster_to_split[labels == 0], cluster_to_split[labels == 1] ]) return clusters # Example usage X = np.random.rand(100, 2) result = div_cluster(X)
Advantages:
- Can be faster than agglomerative clustering for large datasets.
- Often produces more balanced hierarchies.
- Can be more accurate for top-level partitions.
Disadvantages:
- More complex to implement than agglomerative clustering.
- Less commonly used and thus less supported in standard libraries.
- Can struggle with non-globular cluster shapes.
Applications:
- Document classification
- Image segmentation
Conclusion
Thus, hierarchical clustering represents a powerful method for finding the natural structure of your data set and is particularly useful when you don’t know beforehand how many clusters there are. It generates a dendrogram, which is the best way to illustrate how links are forming clusters and it can be interpret as well easily with respect of data groupings.
For this reason, while examining very small and medium datasets, we find abundant use for the approach; albeit specialized & optimized methods exist towards handling larger datasets.