Hierarchical Clustering Using Python Programming

Name: Tejas Sahoo
Roll No: K057
Branch: BTech Cyber Security

Aim:

To implement Hierarchical clustering algorithm

Introduction:

Clustering is a technique that groups similar objects, and hierarchical clustering creates a tree structure to represent the relationships between clusters.

Types of Clustering:

  1. Agglomerative Hierarchical Clustering: Bottom-up approach starting from individual data points.
  2. Divisive Hierarchical Clustering: Top-down approach starting with a single large cluster.

Advantages:

  1. Provides a clear dendrogram for analysis.
  2. Effective for determining natural groupings in data.

Output

import pandas as pd
 
import numpy as np
 
from matplotlib import pyplot as plt
 
from sklearn.cluster import AgglomerativeClustering
 
import scipy.cluster.hierarchy as sch
 
from sklearn import datasets
 
iris = datasets.load_iris()
 
iris_data = pd.DataFrame(iris.data)
 
iris_data.columns = iris.feature_names
 
iris_data['flower_type']=iris.target
 
iris_data.head(10)

Output:

iris_X = iris_data.iloc[:, [0,1,2,3]].values

iris_Y = iris_data.iloc[:,4].values

iris_X

               

iris_Y

Output:


import matplotlib.pyplot as plt

plt.figure(figsize=(15,7))

plt.scatter(iris_X[iris_Y == 0,0],iris_X[iris_Y == 0,1],s=100,c='blue',label='Type 1')

plt.scatter(iris_X[iris_Y == 1,0],iris_X[iris_Y == 1,1],s=100,c='yellow',label='Type 2')

plt.scatter(iris_X[iris_Y == 2,0],iris_X[iris_Y == 2,1],s=100,c='red',label='Type 3')

Output:


import scipy.cluster.hierarchy as sc

plt.figure(figsize=(25,10))

plt.title("Dendrogram")

sc.dendrogram(sc.linkage(iris_X,method='ward'))

plt.title('Dendrogram')

plt.xlabel('Data Points')

plt.ylabel('Euclidean Distance')

Output:

Conclusion:

Hierarchical clustering was successfully implemented in Python.