Coders Packet

Kmeans Clustering Using Python


In this tutorial we will learn unsupervised learning algorithm: KMeans clustering using Python. This algorithm categorises the items into k groups of similarity.


Kmeans clustering is an unsupervised learning algorithm. This algorithm categorizes the items into k groups of similarity. We calculate the similarity using Euclidean distance as measurement.

The algorithm working is explained below:

  1. First, we have to initialize k points, they are initialized randomly and are called means.
  2. The closest means are categorized and the mean's coordinates are updated.
  3. The process is repeated for the given number of iterations and after the last iteration, we have our clusters.  

There are a lot of options available to initialize this means one method is to initialize the means at random items in the data set or the means are initialized at random values between the boundaries of the data set.

Import Libraries and Read Data

Import all the required libraries to the python notebook

Here, we are taking a random data and performing the clustering algorithm.

The above code is for visualising the data points given in the data set. The blue colour dots represents the data.

We choose the number of clusters as 2. Applying k means to the dataset also plotting the centroids of the clusters.
After the execution of the codes, we observe plotted centroids of the cluster in the graph. The Red Dots represents the Centroids. In this way, you can perform the Kmeans algorithm on any given dataset using simple python libraries like pandas and matplotlib.


Download Complete Code


No comments yet