Why MAE over MSE

The Decision

The production XGBoost model uses MAE-equivalent quantile loss (q=0.55); the historical v4.3 ensemble used MAE across all three base learners (HistGradientBoosting, LightGBM, XGBoost). Production has been single-XGBoost since v11.0 (April 2026), but the rationale below — a robust median-targeting loss that resists right-skewed price spikes — applies to both the current single-model architecture and the legacy ensemble.

# Production (v11.0+): XGBoost with quantile loss q=0.55 (MAE-equivalent at q=0.5)
{"objective": "reg:quantileerror", "quantile_alpha": 0.55}

# Historical v4.3 ensemble loss configuration:
# HistGradientBoosting
{"loss": "absolute_error"}

# LightGBM
{"objective": "mae"}

# XGBoost
{"objective": "reg:absoluteerror"}

Why Not MSE?

MSE squares each error before averaging, which means a single large error has a disproportionate impact on the loss. Consider a simple example with 4 predictions:

Hour	Error (EUR/MWh)	Absolute Error	Squared Error
1	2.0	2.0	4.0
2	-3.0	3.0	9.0
3	1.5	1.5	2.25
4	50.0 (spike)	50.0	2,500.0
MAE		14.1
MSE			628.8

The spike at hour 4 dominates the MSE loss (2,500 out of 2,515 total), pulling the model toward minimizing spike errors at the expense of typical-hour accuracy. Under MAE, the spike is still the largest error but doesn’t dominate the gradient signal.

Electricity Price Characteristics

Spanish electricity prices exhibit three properties that make MSE problematic:

1. Right-Skewed Distribution

Prices cluster between 0–80 EUR/MWh most of the time but occasionally spike to 200+ EUR/MWh during demand peaks, maintenance outages, or renewable droughts. These positive outliers make the distribution right-skewed.

2. Heteroscedastic Volatility

Price variance is not constant — it’s higher during peak hours (8:00–21:00) and lower during night hours. MSE-trained models overweight peak-hour errors, reducing off-peak accuracy.

3. Occasional Negative Prices

During high renewable generation periods, prices can go negative. The asymmetry between negative and positive outliers means MSE-trained models are pulled toward the positive tail.

MAE: Predicting the Median

A key theoretical property: MAE-optimized models predict the conditional median, not the conditional mean. For right-skewed distributions, the median is lower than the mean. This creates a systematic negative bias during high-price periods — the model slightly underpredicts when prices are elevated.

Bias Correction

The median-targeting behavior is handled post-prediction by the bias correction system:

Track hourly bias: Rolling 30-day mean error per hour-of-day
Apply correction: Subtract the systematic bias from raw predictions
Clip negative prices: For hours where negative prices are rare (under 5% of history), floor predictions at 0

This two-stage approach (MAE for robust fitting + bias correction for mean alignment) outperforms direct MSE training on both typical and extreme price scenarios.

Trade-off Summary

Property	MAE	MSE
Outlier sensitivity	Robust	Sensitive
Optimization target	Median	Mean
Gradient stability	Constant magnitude	Proportional to error
Typical-hour accuracy	Better	Worse
Spike-hour accuracy	Slightly worse (corrected by bias)	Better (but at cost of typical hours)
Interpretability	Direct EUR/MWh meaning	EUR²/MWh² (less intuitive)

Empirical Validation

Backtest comparisons confirmed the MAE advantage: MAE-trained models achieved lower overall MAE by 5–10% compared to MSE-trained models, with improvements concentrated in the off-peak hours (0:00–7:00 and 22:00–23:00) where prices are more stable and predictable.