L1 loss is a loss function commonly used in machine learning. In this post I will explain what it is, how to implement it in Python, and some common questions that users have.
L1 loss function, what is it?
L1 loss, also known as Absolute Error Loss, is the absolute difference between a prediction and the actual value, calculated for each example in a dataset. The aggregation of all these loss values is called the cost function, where the cost function for L1 is commonly MAE (Mean Absolute Error).
L1 loss function formula
The mathematical formula for calculating l1 loss is:
L1 loss function example
Let’s say we are predicting house prices with a regression model. We could calculate the L1 loss per training example and the result would look like this:
What is the L1 cost function?
The most common cost function to use in conjunction with the L1 loss function is MAE (Mean Absolute Error) which is the mean of all the L1 values.
What’s the difference between L1 loss and MAE cost function?
There can often be confusion around the difference between loss functions and cost functions, so let me explain further.
The L1 loss is an error calculation for each example where we want to understand how well we predicted for that observation, but what if we wanted to understand the error for the whole dataset? To do this we combine all the L1 loss values into a cost function called Mean Absolute Error (MAE) which, as the name suggests, is the mean of all the L1 loss values.
The formula for MAE is therefore:
Calculate L1 loss and MAE cost function in Python
L1 loss is the absolute difference between the actual and the predicted values, and MAE is the mean of all these values, and thus both are simple to implement in Python. I can show this with an example:
Calculate L1 loss and MAE cost using Numpy
import numpy as np
actual = np.array([10, 11, 12, 13])
prediction = np.array([10, 12, 14, 11])
l1_loss = abs(actual - prediction)
[0 1 2 2]
mae_cost = l1_loss.mean()
Should I use L1 loss function?
There are several loss functions that can be used in machine learning, so how do you know if L1 is the right loss function for your use case? Well, that depends on what you are seeking to achieve with your model and what is important to you, but there tends to be one decisive factor:
L1 loss is not sensitive to outliers as it is simply the absolute difference, so if you want to penalise large errors and outliers then L1 is not a great choice and you should probably use L2 loss instead. However, if you don't want to punish infrequent large errors, then L1 is most likely a good choice.