Why MAE over MSE
The Decision
All three models in the EPF ensemble (HistGradientBoosting, LightGBM, XGBoost) use MAE (Mean Absolute Error) as their loss function instead of the more common MSE (Mean Squared Error).
# HistGradientBoosting{"loss": "absolute_error"}
# LightGBM{"objective": "mae"}
# XGBoost{"objective": "reg:absoluteerror"}Why Not MSE?
MSE squares each error before averaging, which means a single large error has a disproportionate impact on the loss. Consider a simple example with 4 predictions:
| Hour | Error (EUR/MWh) | Absolute Error | Squared Error |
|---|---|---|---|
| 1 | 2.0 | 2.0 | 4.0 |
| 2 | -3.0 | 3.0 | 9.0 |
| 3 | 1.5 | 1.5 | 2.25 |
| 4 | 50.0 (spike) | 50.0 | 2,500.0 |
| MAE | 14.1 | ||
| MSE | 628.8 |
The spike at hour 4 dominates the MSE loss (2,500 out of 2,515 total), pulling the model toward minimizing spike errors at the expense of typical-hour accuracy. Under MAE, the spike is still the largest error but doesn’t dominate the gradient signal.
Electricity Price Characteristics
Spanish electricity prices exhibit three properties that make MSE problematic:
1. Right-Skewed Distribution
Prices cluster between 0–80 EUR/MWh most of the time but occasionally spike to 200+ EUR/MWh during demand peaks, maintenance outages, or renewable droughts. These positive outliers make the distribution right-skewed.
2. Heteroscedastic Volatility
Price variance is not constant — it’s higher during peak hours (8:00–21:00) and lower during night hours. MSE-trained models overweight peak-hour errors, reducing off-peak accuracy.
3. Occasional Negative Prices
During high renewable generation periods, prices can go negative. The asymmetry between negative and positive outliers means MSE-trained models are pulled toward the positive tail.
MAE: Predicting the Median
A key theoretical property: MAE-optimized models predict the conditional median, not the conditional mean. For right-skewed distributions, the median is lower than the mean. This creates a systematic negative bias during high-price periods — the model slightly underpredicts when prices are elevated.
Bias Correction
The median-targeting behavior is handled post-prediction by the bias correction system:
- Track hourly bias: Rolling 30-day mean error per hour-of-day
- Apply correction: Subtract the systematic bias from raw predictions
- Clip negative prices: For hours where negative prices are rare (under 5% of history), floor predictions at 0
This two-stage approach (MAE for robust fitting + bias correction for mean alignment) outperforms direct MSE training on both typical and extreme price scenarios.
Trade-off Summary
| Property | MAE | MSE |
|---|---|---|
| Outlier sensitivity | Robust | Sensitive |
| Optimization target | Median | Mean |
| Gradient stability | Constant magnitude | Proportional to error |
| Typical-hour accuracy | Better | Worse |
| Spike-hour accuracy | Slightly worse (corrected by bias) | Better (but at cost of typical hours) |
| Interpretability | Direct EUR/MWh meaning | EUR²/MWh² (less intuitive) |
Empirical Validation
Backtest comparisons confirmed the MAE advantage: MAE-trained models achieved lower overall MAE by 5–10% compared to MSE-trained models, with improvements concentrated in the off-peak hours (0:00–7:00 and 22:00–23:00) where prices are more stable and predictable.