Ensemble Strategy
Current Architecture: LSTM-XGBoost Hybrid (v10.1)
The day-ahead model uses a task-aligned LSTM encoder that processes 7-day price sequences to produce 64-dimensional temporal embeddings. These embeddings are appended to XGBoost’s 90 tabular features, giving the model temporal context that tree-based splits on lag columns cannot recover.
Architecture:
Price sequence (7 days × 96 intervals) ↓ LSTM encoder (task-aligned, 64 hidden units) ↓64-dim temporal embeddings ↓XGBoost (90 tabular + 64 LSTM = 154 features total) ↓ Quarter-hour price forecastKey design decisions validated across 18 experiments (v10.1):
- Task-aligned encoder (not generic): The LSTM is trained jointly with XGBoost so its embeddings are specifically useful for price regression — generic encoders performed worse
- 1-week residual baseline: Predict deviation from the weekly median rather than raw EUR; the model then adds the baseline back. This isolates regime shifts and is optimal for regime-switching markets (4-week baselines introduce “regime memory” bias)
- No price weighting: Confirmed incompatible with LSTM three times across experiments — price weighting destabilises the embedding signal
v10.1 validation results (150-day window, Oct 2025 – Mar 2026, includes Iran crisis):
| Metric | v10.1 LSTM task-aligned | v4.3 (pre-LSTM baseline) |
|---|---|---|
| DA MAE | 12.69 EUR/MWh | 14.47 EUR/MWh |
| Strategic MAE | 17.84 EUR/MWh | 19.79 EUR/MWh |
| Bias | -0.65 EUR/MWh | ~-12 EUR/MWh |
| MaxPred | 209 EUR/MWh | ~127 EUR/MWh |
| Spike Recall | 24.1% | ~16% |
| Crisis MAE (Iran) | 27.16 EUR/MWh | — |
v10.1 beats v4.3 on every metric. The headline improvements: DA MAE 12.69 vs 14.47 EUR/MWh (−12.3%), and bias reduced from ~−12 to −0.65 EUR/MWh (19× better) — the model is nearly unbiased. The v10.1 window includes the March 2026 Iran crisis (prices 170–247 EUR/MWh), which adds ~1–2 EUR/MWh to the reported MAE — the v4.3 window excluded this period, so the real improvement is larger.
See v10.1 changelog for the full 18-experiment validation.
Legacy Architecture: Equal-Weight Ensemble (v4.3)
The original ensemble combined predictions from three gradient boosting implementations — HistGradientBoosting, LightGBM, and XGBoost — into a single forecast. This remained production from Feb–Mar 2026 and is documented here for reference.
Why Ensemble?
Each gradient boosting implementation uses different algorithmic choices:
| Model | Growth Strategy | Missing Values | Regularization |
|---|---|---|---|
| HistGradientBoosting | Depth-wise | Native NaN support | L2 penalty |
| LightGBM | Leaf-wise | Native NaN support | L1/L2, feature/bagging fractions |
| XGBoost | Level-wise | Learned NaN direction | L1/L2, column sampling |
These differences mean the models make different errors on different samples. When one model struggles with a particular pattern, the others often compensate. Averaging smooths out individual weaknesses.
Averaging Method
The default ensemble uses equal-weight averaging:
ensemble_prediction = (histgb + lightgbm + xgboost) / 3This simple approach is surprisingly effective. In backtesting, the ensemble consistently matches or outperforms the best individual model:
| Product | Ensemble MAE | Best Single Model MAE | Best Single Model |
|---|---|---|---|
| D+1 Day-Ahead | 14.47 | 13.95 | XGBoost |
| D+2–D+7 Strategic | 19.79 | 21.42 | HistGBT |
v4.3 backtest results (Oct 2025 – Feb 2026).
The ensemble consistently outperforms individual models on strategic horizons, where the opposing biases of different models partially cancel out during averaging.
Loss Function: Quantile (q=0.55)
All three models are trained with quantile loss targeting the 55th percentile:
- HistGBT:
loss="quantile", quantile=0.55 - LightGBM:
objective="quantile", alpha=0.55 - XGBoost:
objective="reg:quantileerror", quantile_alpha=0.55
Why Quantile Over MAE/MSE?
Electricity prices are right-skewed: bounded near zero but with occasional spikes above 200 EUR/MWh. Standard loss functions have a structural problem:
- MSE targets the conditional mean — sits below the median on skewed data, causing systematic underprediction
- MAE targets the conditional median (50th percentile) — closer, but still tends to undershoot
Quantile loss at q=0.55 targets the 55th percentile, slightly above the median. This directly corrects the underprediction bias without distorting the forecast shape. The 5% shift above the median was chosen empirically to minimize bias on the Spanish OMIE price distribution.
Changed in v4.1. Previously used MAE (v3.1) and MSE (v1.0–v3.0). See the changelog for details.
Training Process
Each model trains independently through the same pipeline:
- Feature construction — Build direct features relative to the forecast origin
- Time series cross-validation — 5-fold
TimeSeriesSplitpreserving temporal order - Per-fold training — Train on each fold, evaluate on the next
- Final model — Retrain on all available data
- Conformal calibration — Build confidence intervals from out-of-fold residuals
Models are saved as joblib artifacts with version timestamps.
Confidence Intervals
The ensemble’s confidence intervals use split conformal prediction with asymmetric bands:
- Collect out-of-fold residuals (predicted - actual) from all CV folds
- Bucket residuals by horizon group (day buckets)
- Compute quantiles of signed residuals:
- 50% band: 25th and 75th percentiles
- 90% band: 5th and 95th percentiles
- At inference:
lower = prediction + quantile_low,upper = prediction + quantile_high
Using signed residuals (rather than absolute residuals) produces asymmetric intervals that reflect the skewed error distribution — wider on the upside where price spikes occur.
Default Hyperparameters
All three models share a common parameter template:
| Parameter | Value | Purpose |
|---|---|---|
max_iter / n_estimators | 500 | Number of boosting rounds |
max_depth | 8 | Tree depth limit |
learning_rate | 0.05 | Step size shrinkage |
min_samples_leaf | 20 | Minimum leaf size |
l2_regularization | 0.1 | L2 penalty weight |
early_stopping | True | Stop if validation loss plateaus |
validation_fraction | 0.1 | Holdout for early stopping |
n_iter_no_change | 20 | Patience rounds |
These can be optimized per horizon group using Optuna hyperparameter tuning.