What is a good R2 (R-Squared) score and how do I interpret it?

What is R2 (R-Squared)?

R-Squared (R2) is a metric for assessing the performance of regression machine learning models. Unlike other metrics such as MAE or RMSE it is not a measure of how accurate the predictions are, but instead a measure of fit. The R2 score gives an indication as to how much of the variation is explained by the independent variables in the model.

How do I calculate R2 in Python?

R2 is straightforward to implement in Python by using the scikit-learn package. Below you will find a simple example:

from sklearn.metrics import r2_score

y_true = [12, -5, 4, 1]
y_pred = [11.5, -1, 5.5, 0]
r2 = r2_score(y_true, y_pred)

What is a good R2 score and how should it be interpreted?

The R2 score ranges from 1, a perfect score, to negative values for under-performing models. The scores that you can achieve and their meaning can be seen here:

  • A score of 1 is the perfect score and indicates that all the variance is explained by the independent variables
  • A score of 0 would indicate that the independent variables don't explain any of the variance
  • A negative score below 0 indicates that the independent variables aren't explaining the variance and are actually contributing negatively to the model

An important reminder when looking at the R2 scores from different models is that the variance found in a dataset is not comparable across datasets, meaning that R2 scores can not be used to directly compare model performance.

Regression metrics

What is a good MSE score?
What is a good MAPE score?
What is MDAPE and how do I calculate it in Python?

Metric calculators

R squared calculator
Coefficient of determination calculator

References

R2 scikit-learn documentation

Stephen Allwright

Stephen Allwright

I'm a Data Scientist currently working for Oda, an online grocery retailer, in Oslo, Norway. These posts are my way of sharing some of the tips and tricks I've picked up along the way.
Oslo, Norway