Skip to content

Why MAE over MSE

The Decision

The production XGBoost model uses MAE-equivalent quantile loss (q=0.55); the historical v4.3 ensemble used MAE across all three base learners (HistGradientBoosting, LightGBM, XGBoost). Production has been single-XGBoost since v11.0 (April 2026), but the rationale below — a robust median-targeting loss that resists right-skewed price spikes — applies to both the current single-model architecture and the legacy ensemble.

# Production (v11.0+): XGBoost with quantile loss q=0.55 (MAE-equivalent at q=0.5)
{"objective": "reg:quantileerror", "quantile_alpha": 0.55}
# Historical v4.3 ensemble loss configuration:
# HistGradientBoosting
{"loss": "absolute_error"}
# LightGBM
{"objective": "mae"}
# XGBoost
{"objective": "reg:absoluteerror"}

Why Not MSE?

MSE squares each error before averaging, which means a single large error has a disproportionate impact on the loss. Consider a simple example with 4 predictions:

HourError (EUR/MWh)Absolute ErrorSquared Error
12.02.04.0
2-3.03.09.0
31.51.52.25
450.0 (spike)50.02,500.0
MAE14.1
MSE628.8

The spike at hour 4 dominates the MSE loss (2,500 out of 2,515 total), pulling the model toward minimizing spike errors at the expense of typical-hour accuracy. Under MAE, the spike is still the largest error but doesn’t dominate the gradient signal.

Electricity Price Characteristics

Spanish electricity prices exhibit three properties that make MSE problematic:

1. Right-Skewed Distribution

Prices cluster between 0–80 EUR/MWh most of the time but occasionally spike to 200+ EUR/MWh during demand peaks, maintenance outages, or renewable droughts. These positive outliers make the distribution right-skewed.

2. Heteroscedastic Volatility

Price variance is not constant — it’s higher during peak hours (8:00–21:00) and lower during night hours. MSE-trained models overweight peak-hour errors, reducing off-peak accuracy.

3. Occasional Negative Prices

During high renewable generation periods, prices can go negative. The asymmetry between negative and positive outliers means MSE-trained models are pulled toward the positive tail.

MAE: Predicting the Median

A key theoretical property: MAE-optimized models predict the conditional median, not the conditional mean. For right-skewed distributions, the median is lower than the mean. This creates a systematic negative bias during high-price periods — the model slightly underpredicts when prices are elevated.

Bias Correction

The median-targeting behavior is handled post-prediction by the bias correction system:

  1. Track hourly bias: Rolling 30-day mean error per hour-of-day
  2. Apply correction: Subtract the systematic bias from raw predictions
  3. Clip negative prices: For hours where negative prices are rare (under 5% of history), floor predictions at 0

This two-stage approach (MAE for robust fitting + bias correction for mean alignment) outperforms direct MSE training on both typical and extreme price scenarios.

Trade-off Summary

PropertyMAEMSE
Outlier sensitivityRobustSensitive
Optimization targetMedianMean
Gradient stabilityConstant magnitudeProportional to error
Typical-hour accuracyBetterWorse
Spike-hour accuracySlightly worse (corrected by bias)Better (but at cost of typical hours)
InterpretabilityDirect EUR/MWh meaningEUR²/MWh² (less intuitive)

Empirical Validation

Backtest comparisons confirmed the MAE advantage: MAE-trained models achieved lower overall MAE by 5–10% compared to MSE-trained models, with improvements concentrated in the off-peak hours (0:00–7:00 and 22:00–23:00) where prices are more stable and predictable.