What is a good performance metric for clustering algorithms?

What are unsupervised clustering algorithms?

Clustering algorithms are an unsupervised learning technique used to find distinct groups in a dataset. Typical examples are finding customers with similar behaviour patterns, products with similar characteristics, and other tasks where the goal is to find groups with distinct characteristics in a dataset.

Can I measure the accuracy of a clustering model?

For supervised learning problems such as a linear regression model that predicts house prices, there is a target that you are trying to predict for. From this target you can infer some form of accuracy by using metrics such as RMSE, MAPE, MAE etc. However, when implementing a clustering algorithm for a dataset with no such target to aim for, an ‘accuracy’ score is not possible. We therefore need to look for other types of measurement that give us an indication of performance. The most common is the distinctness or uniqueness of the clusters created, after all if all clusters look the same then you haven't achieved your goal of creating clusters with unique characteristics. To measure the distinctness of clusters there are 3 common metrics to use, these are:

Which performance metrics are useable for clustering models?

Silhouette Coefficient

This score is between -1 and 1, where the higher the score the more well defined and distinct your clusters are. It can be calculated using scikit-learn in the following way:

from sklearn import metrics
from sklearn.cluster import KMeans

my_model = KMeans().fit(X)
labels = my_model.labels_
metrics.silhouette_score(X,labels)

Calinski-Harabaz Index

Like the Silhouette Coefficient, the higher the score the more well defined the clusters are. This score has no bound, meaning that there is no ‘acceptable’ or ‘good’ value and must be tracked throughout the development of your model to see if it improves or not. It can be calculated using scikit-learn in the following way:

from sklearn import metrics
from sklearn.cluster import KMeans

my_model = KMeans().fit(X)
labels = my_model.labels_
metrics.calinski_harabasz_score(X, labels)

Davies-Bouldin Index

Unlike the previous two metrics, this score measures the similarity of your clusters, meaning that the lower the score the better separation there is between your clusters. It can be calculated using scikit-learn in the following way:

from sklearn.cluster import KMeans
from sklearn.metrics import davies_bouldin_score

my_model = KMeans().fit(X)
labels = my_model.labels_
davies_bouldin_score(X, labels)

Which performance metric should I choose for my clustering algorithm?

The most commonly used metric for measuring performance of a clustering algorithm is the Silhouette Coefficient. This is likely due to it's bound from -1 to 1, making it possible to easily understand the performance and compare against models from different datasets.

Stephen Allwright

Stephen Allwright

I'm a Data Scientist currently working for Oda, an online grocery retailer, in Oslo, Norway. These posts are my way of sharing some of the tips and tricks I've picked up along the way.
Oslo, Norway