K-means Clustering and its Use Cases.

Shashwat Gaur
2 min readJul 19, 2021

All geared up guys?! Today, we have to explore another topic in depth in the domain of data analysis. K-Means Clustering is an algorithm that is used to divide the datasets into “K” pre defined different and non superimposing clusters, where every data point belongs to a specific group. The approach is to define a similarity between the points in the same cluster while having a clear separation between the points from different clusters.

An unsupervised algorithm, K defines the number of clusters, and we can find the optimum number for the value of K. To understand the creation of clusters:

Let’s consider the use case of analysing player stats. We need to assign the data points. If we consider a graph, we plot the entire data on a graph, namely x-axis and y-axis. For instance if it is about cricket, we consider the runs scored or the wickets taken.

Further, we create clusters on the basis of similarity and differences. Let’s suppose we create 3 clusters, where the points are, for example, high runs-low wickets, high wicket-low wickets and low runs-low wickets. The primary step is allotting centroids=3. These points are anywhere, not necessarily centrally located.

The next step is to find the distance between the centroids and the data points. For each point the distance is taken and the least distance from a centroid means the data point belongs to that centroid.Then the repositioning of the centroid is done for the clusters. This is an iterative process, until the final cluster is obtained.

No further changes in the position are required and it indicated that we get two clusters with a centroid.

We should remember that for measuring distance we can use Euclidean distance measure, Manhattan distance measure, A squared euclidean distance measure, Cosine distance measure. This is a simple way to make you understand the K-means clustering algorithm.

Some of the other use-case include Rideshare Data analysis, where it provides insight into traffic paterns and planning the cities for future, Cyber-profiling criminals by finding co-relations, call-record analysis telling us about customer’s needs with the demographics.There are many more use cases on could find out.

Thank You! For more information on the K-means, we will be back with another blog in a while

--

--

Shashwat Gaur
0 Followers

A tech enthusiast and an aspirer, aiming to olve real life problems, looking forward with a collaborative mindset. Planning to put the skills to best use.