Catboost vs XGBoost

This post outlines the differences between two popular gradient boosting machine learning models, XGBoost and Catboost.

Stephen Allwright
Stephen Allwright

What is gradient boosting?

Both of these models are gradient boosting models, so let's have a quick catch-up on what this means.

Gradient boosting is a machine learning technique where many weak learners, typically decision trees, are iteratively trained and combined to create a highly performant model. The decision trees are trained sequentially and use the error from the previous tree to adjust its learning and eventually minimise the loss function.

What is the difference between the gradient boosting models, Catboost and XGBoost?

What is XGBoost?

XGBoost is a gradient boosting machine learning algorithm that can be used for classification and regression problems.

Like all gradient boosting models, it is an ensemble model which trains a series of decision trees sequentially but it does so in a level-wise (aka. horizontally) fashion. In this horizontal sequential training, each decision tree is shallow but the number of trees is many (by default).

XGBoost is designed to be a general all-purpose gradient boosting model which performs well out-of-the-box for most datasets.

What is Catboost?

Catboost is also a gradient boosting machine learning algorithm that can be used for classification and regression problems.

"Catboost" stands for "Category Boosting", and is called such because it's designed to work well for categorical features.

It too is a highly performant model out-of-the-box, but there are a few key features that make Catboost unique. Chief among these is its ability to handle categorical features and text features without having to undertake pre-processing to convert them to numerical features first.

What are the similarities between Catboost and XGBoost?

  1. Model framework. Both use the gradient boosting method to train many weak decision trees in an ensemble model
  2. Performance. Both models perform very well out of the box with standard parameters on most datasets
  3. Use case. They can be used for classification and regression
  4. Datasets. Both can handle large datasets with ease

What are the differences between Catboost and XGBoost?

  1. Training time. Catboost is consistently faster to train and predict than XGBoost, which is notoriously slow to use
  2. Categorical and text data. Catboost can handle categorical and text data without pre-processing, whilst XGBoost requires them to be encoded numerically beforehand
  3. Null values. Catboost handles null values without the need for pre-processing, whilst XGBoost needs them to be dealt with before training
  4. Regularization. Catboost uses ordered boosting for regularisation, whilst XGBoost uses L1 or L2
  5. Overfitting. Due to Catboost's use of ordered boosting for regularisation, it is much less prone to overfitting on training datasets

When should you use Catboost or XGBoost?

Both Catboost and XGBoost are well performing boosting models, but when you should use one or the other depends upon your dataset and technical constraints.

As a rough rule of thumb, I would suggest:

  • Use Catboost when you have a significant number of categorical or text features
  • Use XGBoost if you have a mix of features types and you do not have many technical constraints for the deploying of your model

Catboost vs XGBoost, which is better?

Which model is better depends primarily upon your dataset, where the more categorical variables you have the more reason there is to choose Catboost.

However, if you are unable to decide between the two based on your dataset needs then it is generally recommended to use XGBoost. This is because it works well on a wider range of datasets, is highly tuneable, and is extremely well documented online if you ever need help.

What is a baseline machine learning model?

Model choice

Random Forest vs XGBoost
XGBoost vs LightGBM
Catboost vs LightGBM


Catboost documentation
XGBoost documentation

Machine learning

Stephen Allwright Twitter

I'm a Data Scientist currently working for Oda, an online grocery retailer, in Oslo, Norway. These posts are my way of sharing some of the tips and tricks I've picked up along the way.