Ensemble Strategy

Current Architecture: LSTM-XGBoost Hybrid (v10.1)

The day-ahead model uses a task-aligned LSTM encoder that processes 7-day price sequences to produce 64-dimensional temporal embeddings. These embeddings are appended to XGBoost’s 90 tabular features, giving the model temporal context that tree-based splits on lag columns cannot recover.

Architecture:

Price sequence (7 days × 96 intervals)
         ↓
   LSTM encoder (task-aligned, 64 hidden units)
         ↓
64-dim temporal embeddings
         ↓
XGBoost (90 tabular + 64 LSTM = 154 features total)
         ↓
  Quarter-hour price forecast

Key design decisions validated across 18 experiments (v10.1):

Task-aligned encoder (not generic): The LSTM is trained jointly with XGBoost so its embeddings are specifically useful for price regression — generic encoders performed worse
1-week residual baseline: Predict deviation from the weekly median rather than raw EUR; the model then adds the baseline back. This isolates regime shifts and is optimal for regime-switching markets (4-week baselines introduce “regime memory” bias)
No price weighting: Confirmed incompatible with LSTM three times across experiments — price weighting destabilises the embedding signal

v10.1 validation results (150-day window, Oct 2025 – Mar 2026, includes Iran crisis):

Metric	v10.1 LSTM task-aligned	v4.3 (pre-LSTM baseline)
DA MAE	12.69 EUR/MWh	14.47 EUR/MWh
Strategic MAE	17.84 EUR/MWh	19.79 EUR/MWh
Bias	-0.65 EUR/MWh	~-12 EUR/MWh
MaxPred	209 EUR/MWh	~127 EUR/MWh
Spike Recall	24.1%	~16%
Crisis MAE (Iran)	27.16 EUR/MWh	—

v10.1 beats v4.3 on every metric. The headline improvements: DA MAE 12.69 vs 14.47 EUR/MWh (−12.3%), and bias reduced from ~−12 to −0.65 EUR/MWh (19× better) — the model is nearly unbiased. The v10.1 window includes the March 2026 Iran crisis (prices 170–247 EUR/MWh), which adds ~1–2 EUR/MWh to the reported MAE — the v4.3 window excluded this period, so the real improvement is larger.

See v10.1 changelog for the full 18-experiment validation.

Legacy Architecture: Equal-Weight Ensemble (v4.3)

The original ensemble combined predictions from three gradient boosting implementations — HistGradientBoosting, LightGBM, and XGBoost — into a single forecast. This remained production from Feb–Mar 2026 and is documented here for reference.

Why Ensemble?

Each gradient boosting implementation uses different algorithmic choices:

Model	Growth Strategy	Missing Values	Regularization
HistGradientBoosting	Depth-wise	Native NaN support	L2 penalty
LightGBM	Leaf-wise	Native NaN support	L1/L2, feature/bagging fractions
XGBoost	Level-wise	Learned NaN direction	L1/L2, column sampling

These differences mean the models make different errors on different samples. When one model struggles with a particular pattern, the others often compensate. Averaging smooths out individual weaknesses.

Averaging Method

The default ensemble uses equal-weight averaging:

ensemble_prediction = (histgb + lightgbm + xgboost) / 3

This simple approach is surprisingly effective. In backtesting, the ensemble consistently matches or outperforms the best individual model:

Product	Ensemble MAE	Best Single Model MAE	Best Single Model
D+1 Day-Ahead	14.47	13.95	XGBoost
D+2–D+7 Strategic	19.79	21.42	HistGBT

v4.3 backtest results (Oct 2025 – Feb 2026).

The ensemble consistently outperforms individual models on strategic horizons, where the opposing biases of different models partially cancel out during averaging.

Loss Function: Quantile (q=0.55)

All three models are trained with quantile loss targeting the 55th percentile:

HistGBT: loss="quantile", quantile=0.55
LightGBM: objective="quantile", alpha=0.55
XGBoost: objective="reg:quantileerror", quantile_alpha=0.55

Why Quantile Over MAE/MSE?

Electricity prices are right-skewed: bounded near zero but with occasional spikes above 200 EUR/MWh. Standard loss functions have a structural problem:

MSE targets the conditional mean — sits below the median on skewed data, causing systematic underprediction
MAE targets the conditional median (50th percentile) — closer, but still tends to undershoot

Quantile loss at q=0.55 targets the 55th percentile, slightly above the median. This directly corrects the underprediction bias without distorting the forecast shape. The 5% shift above the median was chosen empirically to minimize bias on the Spanish OMIE price distribution.

Changed in v4.1. Previously used MAE (v3.1) and MSE (v1.0–v3.0). See the changelog for details.

Training Process

Each model trains independently through the same pipeline:

Feature construction — Build direct features relative to the forecast origin
Time series cross-validation — 5-fold TimeSeriesSplit preserving temporal order
Per-fold training — Train on each fold, evaluate on the next
Final model — Retrain on all available data
Conformal calibration — Build confidence intervals from out-of-fold residuals

Models are saved as joblib artifacts with version timestamps.

Confidence Intervals

The ensemble’s confidence intervals use split conformal prediction with asymmetric bands:

Collect out-of-fold residuals (predicted - actual) from all CV folds
Bucket residuals by horizon group (day buckets)
Compute quantiles of signed residuals:
- 50% band: 25th and 75th percentiles
- 90% band: 5th and 95th percentiles
At inference: lower = prediction + quantile_low, upper = prediction + quantile_high

Using signed residuals (rather than absolute residuals) produces asymmetric intervals that reflect the skewed error distribution — wider on the upside where price spikes occur.

Default Hyperparameters

All three models share a common parameter template:

Parameter	Value	Purpose
`max_iter` / `n_estimators`	500	Number of boosting rounds
`max_depth`	8	Tree depth limit
`learning_rate`	0.05	Step size shrinkage
`min_samples_leaf`	20	Minimum leaf size
`l2_regularization`	0.1	L2 penalty weight
`early_stopping`	True	Stop if validation loss plateaus
`validation_fraction`	0.1	Holdout for early stopping
`n_iter_no_change`	20	Patience rounds

These can be optimized per horizon group using Optuna hyperparameter tuning.