Introduction to Hierarchical Clustering
🔹
Definition
Hierarchical Clustering is an unsupervised
machine learning technique used to group similar data points into clusters
by building a tree-like structure called a dendrogram.
🔹 Key Idea
Instead of fixing the number of clusters in
advance, hierarchical clustering:
- Creates
clusters step-by-step
- Shows
how clusters merge or split
- Provides
a visual representation of cluster relationships
🔹
Important Characteristics
- Does
not require pre-defining K (number of clusters)
- Based
on distance (similarity) measures
- Produces
interpretable results
- Suitable
for small to medium datasets
2️. Purpose of Hierarchical Clustering
Hierarchical clustering is used to:
✅ Discover
Natural Groupings
Identify hidden patterns in data without labels.
✅
Understand Cluster Relationships
Shows how data points are related at different
levels.
✅
Exploratory Data Analysis
Helps in research and academic analysis.
✅
Determine Optimal Number of Clusters
Dendrogram helps visually decide the best cluster
count.
✅ Support
Decision Making
Useful in:
- University
performance segmentation
- Healthcare
patient grouping
- Market
segmentation
- Institutional
comparison
3️. Techniques Used in Hierarchical Clustering
Hierarchical clustering has two main techniques:
🔹 1.
Agglomerative Hierarchical Clustering (Bottom-Up Approach)
- Start
with each data point as its own cluster
- Merge
closest clusters step-by-step
- Continue
until all points form one cluster
Most commonly used method.
🔹 2.
Divisive Hierarchical Clustering (Top-Down Approach)
- Start
with all points in one cluster
- Split
clusters recursively
- Continue
splitting until each point becomes separate
Less frequently used.
4️. Distance Measures Used
Distance determines similarity between data points.
Common Distance Metrics:
- Euclidean
Distance
- Manhattan
Distance
- Cosine
Similarity
- Correlation
Distance
5️. Linkage Methods (Cluster Merging Criteria)
Linkage determines how distance between clusters is
calculated.
🔹 Single
Linkage
Minimum distance between clusters.
🔹 Complete
Linkage
Maximum distance between clusters.
🔹 Average
Linkage
Average distance between clusters.
🔹 Ward’s
Method ⭐
Minimizes
within-cluster variance.
Most preferred in research and academic analysis.
Hierarchical
Clustering Diagram – University Dataset Example
(Using features like SAT, Top10, Expenses,
GradRate)
How to Identify Clusters in a Dendrogram
1️.
Look at the vertical lines (height shows distance).
2️. Find the biggest
vertical gap (largest jump in height).
3️. Draw a horizontal line
across that gap.
4. Count how many vertical branches the
line cuts.
That number = Number of clusters.
Draw a horizontal line at
the largest height gap in the dendrogram and count how many branches it cuts —
that gives the number of clusters.
So in above Dendrogram total no. of Clusters are = 7
Download University Dataset
0 टिप्पण्या
कृपया तुमच्या प्रियजनांना लेख शेअर करा आणि तुमचा अभिप्राय जरूर नोंदवा. 🙏 🙏