Why MAE over MSE

The Decision

All three models in the EPF ensemble (HistGradientBoosting, LightGBM, XGBoost) use MAE (Mean Absolute Error) as their loss function instead of the more common MSE (Mean Squared Error).

# HistGradientBoosting
{"loss": "absolute_error"}

# LightGBM
{"objective": "mae"}

# XGBoost
{"objective": "reg:absoluteerror"}

Why Not MSE?

MSE squares each error before averaging, which means a single large error has a disproportionate impact on the loss. Consider a simple example with 4 predictions:

Hour	Error (EUR/MWh)	Absolute Error	Squared Error
1	2.0	2.0	4.0
2	-3.0	3.0	9.0
3	1.5	1.5	2.25
4	50.0 (spike)	50.0	2,500.0
MAE		14.1
MSE			628.8

The spike at hour 4 dominates the MSE loss (2,500 out of 2,515 total), pulling the model toward minimizing spike errors at the expense of typical-hour accuracy. Under MAE, the spike is still the largest error but doesn’t dominate the gradient signal.

Electricity Price Characteristics

Spanish electricity prices exhibit three properties that make MSE problematic:

1. Right-Skewed Distribution

Prices cluster between 0–80 EUR/MWh most of the time but occasionally spike to 200+ EUR/MWh during demand peaks, maintenance outages, or renewable droughts. These positive outliers make the distribution right-skewed.

2. Heteroscedastic Volatility

Price variance is not constant — it’s higher during peak hours (8:00–21:00) and lower during night hours. MSE-trained models overweight peak-hour errors, reducing off-peak accuracy.

3. Occasional Negative Prices

During high renewable generation periods, prices can go negative. The asymmetry between negative and positive outliers means MSE-trained models are pulled toward the positive tail.

MAE: Predicting the Median

A key theoretical property: MAE-optimized models predict the conditional median, not the conditional mean. For right-skewed distributions, the median is lower than the mean. This creates a systematic negative bias during high-price periods — the model slightly underpredicts when prices are elevated.

Bias Correction

The median-targeting behavior is handled post-prediction by the bias correction system:

Track hourly bias: Rolling 30-day mean error per hour-of-day
Apply correction: Subtract the systematic bias from raw predictions
Clip negative prices: For hours where negative prices are rare (under 5% of history), floor predictions at 0

This two-stage approach (MAE for robust fitting + bias correction for mean alignment) outperforms direct MSE training on both typical and extreme price scenarios.

Trade-off Summary

Property	MAE	MSE
Outlier sensitivity	Robust	Sensitive
Optimization target	Median	Mean
Gradient stability	Constant magnitude	Proportional to error
Typical-hour accuracy	Better	Worse
Spike-hour accuracy	Slightly worse (corrected by bias)	Better (but at cost of typical hours)
Interpretability	Direct EUR/MWh meaning	EUR²/MWh² (less intuitive)

Empirical Validation

Backtest comparisons confirmed the MAE advantage: MAE-trained models achieved lower overall MAE by 5–10% compared to MSE-trained models, with improvements concentrated in the off-peak hours (0:00–7:00 and 22:00–23:00) where prices are more stable and predictable.