Why MAE over MSE
The Decision
The production XGBoost model uses MAE-equivalent quantile loss (q=0.55); the historical v4.3 ensemble used MAE across all three base learners (HistGradientBoosting, LightGBM, XGBoost). Production has been single-XGBoost since v11.0 (April 2026), but the rationale below — a robust median-targeting loss that resists right-skewed price spikes — applies to both the current single-model architecture and the legacy ensemble.
# Production (v11.0+): XGBoost with quantile loss q=0.55 (MAE-equivalent at q=0.5){"objective": "reg:quantileerror", "quantile_alpha": 0.55}
# Historical v4.3 ensemble loss configuration:# HistGradientBoosting{"loss": "absolute_error"}
# LightGBM{"objective": "mae"}
# XGBoost{"objective": "reg:absoluteerror"}Why Not MSE?
MSE squares each error before averaging, which means a single large error has a disproportionate impact on the loss. Consider a simple example with 4 predictions:
| Hour | Error (EUR/MWh) | Absolute Error | Squared Error |
|---|---|---|---|
| 1 | 2.0 | 2.0 | 4.0 |
| 2 | -3.0 | 3.0 | 9.0 |
| 3 | 1.5 | 1.5 | 2.25 |
| 4 | 50.0 (spike) | 50.0 | 2,500.0 |
| MAE | 14.1 | ||
| MSE | 628.8 |
The spike at hour 4 dominates the MSE loss (2,500 out of 2,515 total), pulling the model toward minimizing spike errors at the expense of typical-hour accuracy. Under MAE, the spike is still the largest error but doesn’t dominate the gradient signal.
Electricity Price Characteristics
Spanish electricity prices exhibit three properties that make MSE problematic:
1. Right-Skewed Distribution
Prices cluster between 0–80 EUR/MWh most of the time but occasionally spike to 200+ EUR/MWh during demand peaks, maintenance outages, or renewable droughts. These positive outliers make the distribution right-skewed.
2. Heteroscedastic Volatility
Price variance is not constant — it’s higher during peak hours (8:00–21:00) and lower during night hours. MSE-trained models overweight peak-hour errors, reducing off-peak accuracy.
3. Occasional Negative Prices
During high renewable generation periods, prices can go negative. The asymmetry between negative and positive outliers means MSE-trained models are pulled toward the positive tail.
MAE: Predicting the Median
A key theoretical property: MAE-optimized models predict the conditional median, not the conditional mean. For right-skewed distributions, the median is lower than the mean. This creates a systematic negative bias during high-price periods — the model slightly underpredicts when prices are elevated.
Bias Correction
The median-targeting behavior is handled post-prediction by the bias correction system:
- Track hourly bias: Rolling 30-day mean error per hour-of-day
- Apply correction: Subtract the systematic bias from raw predictions
- Clip negative prices: For hours where negative prices are rare (under 5% of history), floor predictions at 0
This two-stage approach (MAE for robust fitting + bias correction for mean alignment) outperforms direct MSE training on both typical and extreme price scenarios.
Trade-off Summary
| Property | MAE | MSE |
|---|---|---|
| Outlier sensitivity | Robust | Sensitive |
| Optimization target | Median | Mean |
| Gradient stability | Constant magnitude | Proportional to error |
| Typical-hour accuracy | Better | Worse |
| Spike-hour accuracy | Slightly worse (corrected by bias) | Better (but at cost of typical hours) |
| Interpretability | Direct EUR/MWh meaning | EUR²/MWh² (less intuitive) |
Empirical Validation
Backtest comparisons confirmed the MAE advantage: MAE-trained models achieved lower overall MAE by 5–10% compared to MSE-trained models, with improvements concentrated in the off-peak hours (0:00–7:00 and 22:00–23:00) where prices are more stable and predictable.