What is the best classification metric for imbalanced data?

What is the impact of imbalanced data?

Imbalanced data refers to a situation, primarily in classification machine modelling, where one target class represents a clear majority of observations. For example, say you were building a fraud detection model, then chances are your data would consist of >99% negative cases and <1% positive cases, this is an imbalanced dataset.

An imbalanced dataset causes problems when wanting to assess the performance of a machine learning model trained to predict for those minority positive cases. The reason for this is because if a machine learning model were to predict each observation as negative then the model would receive >99% in accuracy, which looks great on the surface but is clearly not representative of the model performance given the makeup of the data.

Which metrics can you use for imbalanced data?

We've seen that a basic accuracy won't be very effective for imbalanced datasets, so what types of metrics could one use? Well there are a number of metrics that perform particularly well for imbalanced datasets, these are:

Which is the best metric for imbalanced data?

In general it is good practice to track multiple metrics when developing a machine learning model as each highlights different aspects of model performance. However if one needed to choose one metric to use as a north star metric then I would choose F1 score as this a good all around classification metric which balances precision and recall whilst also performing fairly well on imbalanced datasets.

Classification metrics

Accuracy
Balanced accuracy
Using cross_val_score for model testing in sklearn
Using cross_validate for model testing in sklearn

Metric comparisons

AUC vs accuracy, which is the best metric?
Accuracy vs balanced accuracy, which is the best metric?
F1 score vs AUC, which is the best classification metric?

References

scikit-learn classification metrics
f1 score documentation

Stephen Allwright

Stephen Allwright

I'm a Data Scientist currently working for Oda, an online grocery retailer, in Oslo, Norway. These posts are my way of sharing some of the tips and tricks I've picked up along the way.
Oslo, Norway