Skip to content

Why MAE over MSE

The Decision

All three models in the EPF ensemble (HistGradientBoosting, LightGBM, XGBoost) use MAE (Mean Absolute Error) as their loss function instead of the more common MSE (Mean Squared Error).

# HistGradientBoosting
{"loss": "absolute_error"}
# LightGBM
{"objective": "mae"}
# XGBoost
{"objective": "reg:absoluteerror"}

Why Not MSE?

MSE squares each error before averaging, which means a single large error has a disproportionate impact on the loss. Consider a simple example with 4 predictions:

HourError (EUR/MWh)Absolute ErrorSquared Error
12.02.04.0
2-3.03.09.0
31.51.52.25
450.0 (spike)50.02,500.0
MAE14.1
MSE628.8

The spike at hour 4 dominates the MSE loss (2,500 out of 2,515 total), pulling the model toward minimizing spike errors at the expense of typical-hour accuracy. Under MAE, the spike is still the largest error but doesn’t dominate the gradient signal.

Electricity Price Characteristics

Spanish electricity prices exhibit three properties that make MSE problematic:

1. Right-Skewed Distribution

Prices cluster between 0–80 EUR/MWh most of the time but occasionally spike to 200+ EUR/MWh during demand peaks, maintenance outages, or renewable droughts. These positive outliers make the distribution right-skewed.

2. Heteroscedastic Volatility

Price variance is not constant — it’s higher during peak hours (8:00–21:00) and lower during night hours. MSE-trained models overweight peak-hour errors, reducing off-peak accuracy.

3. Occasional Negative Prices

During high renewable generation periods, prices can go negative. The asymmetry between negative and positive outliers means MSE-trained models are pulled toward the positive tail.

MAE: Predicting the Median

A key theoretical property: MAE-optimized models predict the conditional median, not the conditional mean. For right-skewed distributions, the median is lower than the mean. This creates a systematic negative bias during high-price periods — the model slightly underpredicts when prices are elevated.

Bias Correction

The median-targeting behavior is handled post-prediction by the bias correction system:

  1. Track hourly bias: Rolling 30-day mean error per hour-of-day
  2. Apply correction: Subtract the systematic bias from raw predictions
  3. Clip negative prices: For hours where negative prices are rare (under 5% of history), floor predictions at 0

This two-stage approach (MAE for robust fitting + bias correction for mean alignment) outperforms direct MSE training on both typical and extreme price scenarios.

Trade-off Summary

PropertyMAEMSE
Outlier sensitivityRobustSensitive
Optimization targetMedianMean
Gradient stabilityConstant magnitudeProportional to error
Typical-hour accuracyBetterWorse
Spike-hour accuracySlightly worse (corrected by bias)Better (but at cost of typical hours)
InterpretabilityDirect EUR/MWh meaningEUR²/MWh² (less intuitive)

Empirical Validation

Backtest comparisons confirmed the MAE advantage: MAE-trained models achieved lower overall MAE by 5–10% compared to MSE-trained models, with improvements concentrated in the off-peak hours (0:00–7:00 and 22:00–23:00) where prices are more stable and predictable.