L2 loss function, explained

L2 loss is a loss function commonly used in machine learning. In this post I will explain what it is, how to implement it in Python, and some common questions that users have.

L2 loss function, what is it?

L2 loss, also known as Squared Error Loss, is the squared difference between a prediction and the actual value, calculated for each example in a dataset. The aggregation of all these loss values is called the cost function, where the cost function for L2 is commonly MSE (Mean of Squared Errors).

L2 loss function formula

The mathematical formula for calculating l2 loss is:

l2 loss function

L2 loss function example

Let’s say we are predicting house prices with a regression model. We could calculate the L2 loss per training example and the result would look like this:

Actual value Predicted value Difference L2 loss
$200,000 $220,000 $20,000 400,000,000
$400,000 $385,000 $15,000 225,000,000
$300,000 $305,000 $5,000 25,000,000

Is L2 loss the same as MSE (Mean of Squared Errors)?

L2 loss and MSE are related, but not the same. L2 loss is the loss for each example, whilst MSE is the cost function which is an aggregation of all the loss values in the dataset.

Let me explain further.

The L2 loss is an error calculation for each example where we want to understand how well we predicted for that observation, but what if we wanted to understand the error for the whole dataset? To do this we combine all the L2 loss values into a cost function called Mean of Squared Errors (MSE) which, as the name suggests, is the mean of all the L2 loss values.

The formula for MSE is therefore:

Mean of Squared Errors (MSE) cost function

Calculate L2 loss and MSE cost function in Python

L2 loss is the squared difference between the actual and the predicted values, and MSE is the mean of all these values, and thus both are simple to implement in Python. I can show this with an example:

Calculate L2 loss and MSE cost using Numpy

import numpy as np

actual = np.array([10, 11, 12, 13])
prediction = np.array([10, 12, 14, 11])

l2_loss = (actual - prediction) ** 2

[0 1 4 4]

mse_cost = l2_loss.mean()


Should I use L2 loss function?

There are several loss functions that can be used in machine learning, so how do you know if L2 is the right loss function for your use case? Well, that depends on what you are seeking to achieve with your model and what is important to you, but there tends to be one decisive factor:

L2 loss is very sensitive to outliers because it squares the difference, so if you want to penalise large errors and outliers then L2 is a great choice. However, if you don't want to punish infrequent large errors, then L2 is most likely not a good choice and you should probably use L1 loss instead.

Loss function vs cost function, what’s the difference?
RMSE vs MSE, which should I use?


Numpy subtract arrays
Wikipedia article on Loss functions

Stephen Allwright

Stephen Allwright

I'm a Data Scientist currently working for Oda, an online grocery retailer, in Oslo, Norway. These posts are my way of sharing some of the tips and tricks I've picked up along the way.
Oslo, Norway