2.5 Loss Function

In machine learning (ML), a loss function is used to measure model performance by calculating the deviation of a model’s predictions from the correct, “ground truth” predictions. Optimizing a model entails adjusting model parameters to minimize the output of some loss function.

Created Date: 2025-05-10

2.5.1 MSE Loss

In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the true value.

We can view torch.nn.MSELoss to understand its calculation rules:

Creates a criterion that measures the mean squared error (squared L2 norm) between each element in the input \(x\) and target \(y\).

\(l(x, y) = L = {{l_1, \cdots, l_n}}^T, l_n = {(x_n - y_n)}^2\)

where \(N\) is the batch size.

File mse_loss.py

def dervi_mse(y_pred, y_true):
    return 2 * (y_pred - y_true) / len(y_true)

y_true = numpy.array([1.0, 2.0, 3.0, 4.0, 5.0])
y_pred = numpy.array([1.2, 1.8, 3.5, 4.1, 5.3])

average_loss = ((y_pred - y_true) ** 2).mean()
print(average_loss)

dl_dy = dervi_mse(y_pred, y_true)
print(dl_dy)
0.0856
[ 0.08 -0.08  0.2   0.04  0.12]
y_true = torch.tensor(y_true, requires_grad=False)
y_pred = torch.tensor(y_pred, requires_grad=True)

# update gradient
average_loss = torch.nn.MSELoss()(y_pred, y_true)
print(average_loss.item())

average_loss.backward()
print(y_pred.grad)
0.0856
tensor([ 0.0800, -0.0800,  0.2000,  0.0400,  0.1200], dtype=torch.float64)

2.5.2 Cross Entropy Loss