What is a good ROC AUC score? Explained simply

ROC AUC score ranges from 0.5 to 1, where 1 is the perfect score and 0.5 means the model is as good as random. Given this, how do you know what a good score is? And can these scores be compared across different use cases? By the end of this post I hope to leave you more knowledgeable on the topic of ROC AUC.

What is the ROC AUC score?

ROC AUC (Area Under the Receiver Operating Characteristic Curve) score is a metric used to assess the performance of classification machine learning models.

The ROC is a graph which maps the relationship between true positive rate (TPR) and the false positive rate (FPR), showing the TPR that we can expect to receive for a given trade-off with FPR. The ROC AUC score is the area under this ROC curve, meaning that the resulting score represents in broad terms the model's ability to predict classes correctly.

How do I calculate ROC AUC in Python?

The ROC AUC score is a simple metric to calculate in Python with the help of the scikit-learn package. See below a simple example for binary classification:

from sklearn.metrics import roc_auc_score

y_true = [0,1,1,0,0,1]
y_pred = [0,0,1,1,0,1]

roc_auc_score(y_true, y_pred)

What are the benefits and negatives of ROC AUC?

ROC AUC is a very common metric to use when developing classification models, however there are some aspects to keep in mind when using it:

Advantages of using ROC AUC score

  1. A simple to track overall performance metric for classification models
  2. A single metric which covers sensitivity and specificity

Disadvantage of using ROC AUC score

  1. Does not perform well with heavily imbalanced datasets
  2. Not very intuitive for end users to understand

What is a good ROC AUC score?

The ROC AUC score ranges from 0.5 to 1, where 1 is a perfect score and 0.5 means the model is as good as random. As with all metrics, a good score depends on the use case and the dataset being used, medical use cases for example require a much higher score than e-commerce. However, a good rule of thumb for a good ROC AUC score is

  • 0.5: As good as random choice
  • 0.5-0.7: Poor performance
  • 0.7-0.8: OK performance
  • >0.8-0.9: Very good performance

Alternative classification metrics

F1 score
Accuracy
Balanced accuracy

Metric comparisons

F1 score vs AUC
AUC vs accuracy
F1 score vs accuracy
Micro vs Macro F1 score

References

scikit-learn documentation
scikit-learn explainer
Receiver operating characteristic explainer

Stephen Allwright

Stephen Allwright

I'm a Data Scientist currently working for Oda, an online grocery retailer, in Oslo, Norway. These posts are my way of sharing some of the tips and tricks I've picked up along the way.
Oslo, Norway