Ensemble Strategy (Historical)

The current EPF production architecture (all 4 countries, since M0.6 Phase F cutover on 2026-04-09) is a single XGBoost model per country × horizon group, with no ensemble and no LSTM.

The v10.x LSTM-XGBoost hybrid was retracted on 2026-04-09 after two code-level bugs were found. See v11.0 post-LSTM correction and the LSTM page for the retraction narrative.
The v4.3 equal-weight ensemble (HistGradientBoosting + LightGBM + XGBoost) was retired at the same cutover. The pre-LSTM v8 scout configuration (single XGBoost + residual_1w + pw3 + d365) strictly dominates it on the same evaluation window.

See the XGBoost page for the current single-model v11.0+ configuration and its multi-country extensions.

This page is kept as a historical record of two architectures that were once production but are no longer.

Retracted: LSTM-XGBoost Hybrid (v10.1)

The day-ahead model in the v10.x series used a task-aligned LSTM encoder that was intended to process 7-day price sequences into 64-dimensional temporal embeddings appended to XGBoost’s tabular features. In practice, bugs in the training and inference paths meant the LSTM block contributed zero useful signal. The measured v10.1 metrics describe broken code, and the headline “DA MAE 12.69 / bias −0.65” numbers that were once quoted here are artifacts of the broken LSTM acting as accidental noise regularization rather than real architectural gains. The configuration is documented in detail on the LSTM page for the historical record; do not use it as guidance for current or future work.

Retired: Equal-Weight Ensemble (v4.3)

The original ensemble combined predictions from three gradient boosting implementations — HistGradientBoosting, LightGBM, and XGBoost — into a single forecast. This remained production from Feb–Mar 2026 and is documented here for reference.

Why Ensemble?

Each gradient boosting implementation uses different algorithmic choices:

Model	Growth Strategy	Missing Values	Regularization
HistGradientBoosting	Depth-wise	Native NaN support	L2 penalty
LightGBM	Leaf-wise	Native NaN support	L1/L2, feature/bagging fractions
XGBoost	Level-wise	Learned NaN direction	L1/L2, column sampling

These differences mean the models make different errors on different samples. When one model struggles with a particular pattern, the others often compensate. Averaging smooths out individual weaknesses.

Averaging Method

The default ensemble uses equal-weight averaging:

ensemble_prediction = (histgb + lightgbm + xgboost) / 3

This simple approach is surprisingly effective. In backtesting, the ensemble consistently matches or outperforms the best individual model:

Product	Ensemble MAE	Best Single Model MAE	Best Single Model
D+1 Day-Ahead	14.47	13.95	XGBoost
D+2–D+7 Strategic	19.79	21.42	HistGBT

v4.3 backtest results (Oct 2025 – Feb 2026).

The ensemble consistently outperforms individual models on strategic horizons, where the opposing biases of different models partially cancel out during averaging.

Loss Function: Quantile (q=0.55)

All three models are trained with quantile loss targeting the 55th percentile:

HistGBT: loss="quantile", quantile=0.55
LightGBM: objective="quantile", alpha=0.55
XGBoost: objective="reg:quantileerror", quantile_alpha=0.55

Why Quantile Over MAE/MSE?

Electricity prices are right-skewed: bounded near zero but with occasional spikes above 200 EUR/MWh. Standard loss functions have a structural problem:

MSE targets the conditional mean — sits below the median on skewed data, causing systematic underprediction
MAE targets the conditional median (50th percentile) — closer, but still tends to undershoot

Quantile loss at q=0.55 targets the 55th percentile, slightly above the median. This directly corrects the underprediction bias without distorting the forecast shape. The 5% shift above the median was chosen empirically to minimize bias on the Spanish OMIE price distribution.

Changed in v4.1. Previously used MAE (v3.1) and MSE (v1.0–v3.0). See the changelog for details.

Training Process

Each model trains independently through the same pipeline:

Feature construction — Build direct features relative to the forecast origin
Time series cross-validation — 5-fold TimeSeriesSplit preserving temporal order
Per-fold training — Train on each fold, evaluate on the next
Final model — Retrain on all available data
Conformal calibration — Build confidence intervals from out-of-fold residuals

Models are saved as joblib artifacts with version timestamps.

Confidence Intervals

The ensemble’s confidence intervals use split conformal prediction with asymmetric bands:

Collect out-of-fold residuals (predicted - actual) from all CV folds
Bucket residuals by horizon group (day buckets)
Compute quantiles of signed residuals:
- 50% band: 25th and 75th percentiles
- 90% band: 5th and 95th percentiles
At inference: lower = prediction + quantile_low, upper = prediction + quantile_high

Using signed residuals (rather than absolute residuals) produces asymmetric intervals that reflect the skewed error distribution — wider on the upside where price spikes occur.

Default Hyperparameters

All three models share a common parameter template:

Parameter	Value	Purpose
`max_iter` / `n_estimators`	500	Number of boosting rounds
`max_depth`	8	Tree depth limit
`learning_rate`	0.05	Step size shrinkage
`min_samples_leaf`	20	Minimum leaf size
`l2_regularization`	0.1	L2 penalty weight
`early_stopping`	True	Stop if validation loss plateaus
`validation_fraction`	0.1	Holdout for early stopping
`n_iter_no_change`	20	Patience rounds

These can be optimized per horizon group using Optuna hyperparameter tuning.