How to install LightGBM in Python

This post walks you through how to install the popular machine learning model, LightGBM, in Python.

26 Mar 2023

📋

This post will go through the following:

1. What is LightGBM?
2. Install LightGBM in Python
3. Use LightGBM in Python

What is LightGBM?

LightGBM is an open-source machine learning model developed by Microsoft for classification and regression problems which uses gradient boosting.

It's an ensemble method which trains a series of decision trees sequentially but does so leaf-wise (aka. vertically), where the trees have many leaves but the number of trees is relatively low. This approach creates a highly performant boosting model whilst being fast to train.

💡

LightGBM stands for "Light Gradient Boosting Machine"

How do I install LightGBM in Python?

There are several methods for installing LightGBM, the most common of which are:

Pip
Conda
Poetry
Homebrew (macOS)

Install LightGBM using pip

The suggested and preferred method for installing LightGBM is through the pip package manager. In order to do this, you need to run this command in your terminal:

pip install lightgbm

Install LightGBM using Conda

Whilst not the preferred method, it is also possible to install LightGBM using the Conda package manager, by running this in your terminal:

conda install -c conda-forge lightgbm

Install LightGBM using Poetry

If you use Poetry for your Python environment management, then you can install it by adding the package to your project requirements like so:

cd pre-existing-project
poetry init
poetry add lightgbm

Install LightGBM using Homebrew on macOS

When on macOS it's possible to use Homebrew to install LightGBM through the terminal:

brew install lightgbm

How do I use LightGBM in Python?

Once you have installed the LightGBM package using one of the above methods, it can be used within your Python script.

Here is a simple example of how that could look:

import lightgbm as lgb
import pandas as pd
from sklearn.model_selection import train_test_split

x = pd.read_csv('data_train.csv')
y = pd.read_csv('data_target.csv')

x_train, x_test, y_train, y_test = train_test_split(x.values, y.values, test_size=0.2)

train_data = lgb.Dataset(x_train, label=y_train)
test_data = lgb.Dataset(x_test, label=y_test)

params = {'metric': 'auc', 'objective': 'binary'}

model = lgb.train(params,
                train_data,
                num_boost_round=100,
				valid_sets=test_data)

In this example we undertook the following steps:

Load the feature dataset, x, and the targets, y
Split the data into testing and training
Convert the data into LightGBM Dataset objects
Define the parameters for the model, which in our case is binary classification using auc as the metric
Train the model

LightGBM vs XGBoost
LightGBM vs Catboost

References

LightGBM documentation
Poetry documentation

Machine learning

Stephen Allwright Twitter

I'm a Data Scientist currently working for Oda, an online grocery retailer, in Oslo, Norway. These posts are my way of sharing some of the tips and tricks I've picked up along the way.