AUC vs accuracy, which is best?

AUC vs accuracy, which is the best metric?

AUC and accuracy are common metrics for classification models, but which is the best to use? In this post I will look at the similarities and differences, and help you decide which is best for your use case.

Stephen Allwright
Stephen Allwright

AUC and accuracy are common metrics for classification models, but which is the best to use? In this post I will look at the similarities and differences, and help you decide which is best for your use case.

What are AUC and accuracy?

AUC and accuracy are two of the most commonly used classification metrics in machine learning, but they’re popular for different reasons. Accuracy is widely used as it’s understood by the majority of people, whilst AUC is used as it’s a very good all-round metric for classification.

What is AUC?

AUC, or to use it’s full name ROC AUC, stands for Area Under the Receiver Operating Characteristic Curve. The score it produces ranges from 0.5 to 1 where 1 is the best score and 0.5 means the model is as good as random.

What this long name means is that the metric is calculated as the area underneath the Receiver Operating Characteristic Curve (ROC). The ROC is a graph which maps the relationship between the true positive rate (TPR) of the model and the false positive rate (FPR). It shows at various intervals the TPR that we can expect to receive for a given trade-off with FPR.

The area under this ROC curve, AUC, equates to the model’s ability to predict classes correctly, as a large amount of area would show that the model can achieve a high true positive rate with a correspondingly low false positive rate.

What is accuracy?

Accuracy is one of the simplest metrics available to us for classification models. It is the number of correct predictions as a percentage of the number of observations in the dataset. The score ranges from 0% to 100%, where 100% is a perfect score and 0% is the worst.

How to implement AUC and accuracy in Python using sklearn

Accuracy and AUC are both simple to implement in Python, but first let’s look at how using these metrics would fit into a typical development workflow:

  1. Create a prepared dataset
  2. Separate the dataset into training and testing
  3. Choose your model and run hyper-parameter tuning on the training dataset
  4. Run cross validation with your model on the training dataset using AUC or accuracy as metrics
  5. Train your final model on the full training dataset
  6. Test your final model on the test dataset using AUC or accuracy as metrics

We can see that we would use our metrics of choice in two places. The first being during the cross validation phase, and the second being at the end when we want to test our final model.

I will show a much simpler example than the full workflow shown above, which just illustrates how to call the required functions:

from sklearn.metrics import accuracy_score, roc_auc_score

y_pred = [1, 0, 1, 1]
y_true = [0, 1, 1, 0]

accuracy = accuracy_score(y_true, y_pred) 
auc = roc_auc_score(y_true, y_pred)

Similarities and differences of AUC and accuracy

Given that both AUC and accuracy are used for classification models, there are some obvious similarities. However there are some key differences to be aware of which may help you make your decision.

Similarities between AUC and accuracy

  1. Both are metrics for classification models
  2. Both are easily implemented using the scikit-learn package

Differences between AUC and accuracy

  1. Accuracy is widely understood by end users whilst AUC often requires some explanation
  2. Accuracy does not work well on imbalanced datasets whilst AUC performs quite well
  3. AUC measures the model’s sensitivity and specificity, whilst accuracy does not distinguish between these and is much more simplistic

AUC vs accuracy, which is best?

AUC and accuracy can be used in the same context but are very different metrics. Accuracy is simple to use and easily understood by many, but does not measure a model’s true performance.

So, AUC or accuracy, which is best?

I would recommend using AUC over accuracy as it’s a much better indicator of model performance. This is due to AUC using the relationship between True Positive Rate and False Positive Rate to calculate the metric. If you are wanting to use accuracy as a metric, then I would encourage you to track other metrics as well, such as AUC or F1.


Other classification metrics

Balanced accuracy
Classification metrics for imbalanced data
Interpret AUC values

Classification metric comparisons

Accuracy vs balanced accuracy
F1 score vs AUC
F1 score vs accuracy
Micro vs Macro F1 score

Metric calculators

Accuracy calculator

References

AUC sklearn documentation
Accuracy sklearn documentation

Metrics

Stephen Allwright Twitter

I'm a Data Scientist currently working for Oda, an online grocery retailer, in Oslo, Norway. These posts are my way of sharing some of the tips and tricks I've picked up along the way.