ROC AUC score ranges from 0.5 to 1, where 1 is the perfect score and 0.5 means the model is as good as random. Given this, how do you know what a good score is? And can these scores be compared across different use cases? By the end of this post I hope to leave you more knowledgeable on the topic of ROC AUC.
What is the ROC AUC score?
ROC AUC (Area Under the Receiver Operating Characteristic Curve) score is a metric used to assess the performance of classification machine learning models.
The ROC is a graph which maps the relationship between true positive rate (TPR) and the false positive rate (FPR), showing the TPR that we can expect to receive for a given trade-off with FPR. The ROC AUC score is the area under this ROC curve, meaning that the resulting score represents in broad terms the model's ability to predict classes correctly.
How do I calculate ROC AUC in Python?
The ROC AUC score is a simple metric to calculate in Python with the help of the scikit-learn package. See below a simple example for binary classification:
from sklearn.metrics import roc_auc_score y_true = [0,1,1,0,0,1] y_pred = [0,0,1,1,0,1] roc_auc_score(y_true, y_pred)
What are the benefits and negatives of ROC AUC?
ROC AUC is a very common metric to use when developing classification models, however there are some aspects to keep in mind when using it:
Advantages of using ROC AUC score
- A simple to track overall performance metric for classification models
- A single metric which covers sensitivity and specificity
Disadvantage of using ROC AUC score
- Does not perform well with heavily imbalanced datasets
- Not very intuitive for end users to understand
What is a good ROC AUC score?
The ROC AUC score ranges from 0.5 to 1, where 1 is a perfect score and 0.5 means the model is as good as random. As with all metrics, a good score depends on the use case and the dataset being used, medical use cases for example require a much higher score than e-commerce. However, a good rule of thumb for a good ROC AUC score is
- 0.5: As good as random choice
- 0.5-0.7: Poor performance
- 0.7-0.8: OK performance
- >0.8-0.9: Very good performance