Skip to content

Ensemble Strategy

Current Architecture: LSTM-XGBoost Hybrid (v10.1)

The day-ahead model uses a task-aligned LSTM encoder that processes 7-day price sequences to produce 64-dimensional temporal embeddings. These embeddings are appended to XGBoost’s 90 tabular features, giving the model temporal context that tree-based splits on lag columns cannot recover.

Architecture:

Price sequence (7 days × 96 intervals)
LSTM encoder (task-aligned, 64 hidden units)
64-dim temporal embeddings
XGBoost (90 tabular + 64 LSTM = 154 features total)
Quarter-hour price forecast

Key design decisions validated across 18 experiments (v10.1):

  • Task-aligned encoder (not generic): The LSTM is trained jointly with XGBoost so its embeddings are specifically useful for price regression — generic encoders performed worse
  • 1-week residual baseline: Predict deviation from the weekly median rather than raw EUR; the model then adds the baseline back. This isolates regime shifts and is optimal for regime-switching markets (4-week baselines introduce “regime memory” bias)
  • No price weighting: Confirmed incompatible with LSTM three times across experiments — price weighting destabilises the embedding signal

v10.1 validation results (150-day window, Oct 2025 – Mar 2026, includes Iran crisis):

Metricv10.1 LSTM task-alignedv4.3 (pre-LSTM baseline)
DA MAE12.69 EUR/MWh14.47 EUR/MWh
Strategic MAE17.84 EUR/MWh19.79 EUR/MWh
Bias-0.65 EUR/MWh~-12 EUR/MWh
MaxPred209 EUR/MWh~127 EUR/MWh
Spike Recall24.1%~16%
Crisis MAE (Iran)27.16 EUR/MWh

v10.1 beats v4.3 on every metric. The headline improvements: DA MAE 12.69 vs 14.47 EUR/MWh (−12.3%), and bias reduced from ~−12 to −0.65 EUR/MWh (19× better) — the model is nearly unbiased. The v10.1 window includes the March 2026 Iran crisis (prices 170–247 EUR/MWh), which adds ~1–2 EUR/MWh to the reported MAE — the v4.3 window excluded this period, so the real improvement is larger.

See v10.1 changelog for the full 18-experiment validation.


Legacy Architecture: Equal-Weight Ensemble (v4.3)

The original ensemble combined predictions from three gradient boosting implementations — HistGradientBoosting, LightGBM, and XGBoost — into a single forecast. This remained production from Feb–Mar 2026 and is documented here for reference.

Why Ensemble?

Each gradient boosting implementation uses different algorithmic choices:

ModelGrowth StrategyMissing ValuesRegularization
HistGradientBoostingDepth-wiseNative NaN supportL2 penalty
LightGBMLeaf-wiseNative NaN supportL1/L2, feature/bagging fractions
XGBoostLevel-wiseLearned NaN directionL1/L2, column sampling

These differences mean the models make different errors on different samples. When one model struggles with a particular pattern, the others often compensate. Averaging smooths out individual weaknesses.

Averaging Method

The default ensemble uses equal-weight averaging:

ensemble_prediction = (histgb + lightgbm + xgboost) / 3

This simple approach is surprisingly effective. In backtesting, the ensemble consistently matches or outperforms the best individual model:

ProductEnsemble MAEBest Single Model MAEBest Single Model
D+1 Day-Ahead14.4713.95XGBoost
D+2–D+7 Strategic19.7921.42HistGBT

v4.3 backtest results (Oct 2025 – Feb 2026).

The ensemble consistently outperforms individual models on strategic horizons, where the opposing biases of different models partially cancel out during averaging.

Loss Function: Quantile (q=0.55)

All three models are trained with quantile loss targeting the 55th percentile:

  • HistGBT: loss="quantile", quantile=0.55
  • LightGBM: objective="quantile", alpha=0.55
  • XGBoost: objective="reg:quantileerror", quantile_alpha=0.55

Why Quantile Over MAE/MSE?

Electricity prices are right-skewed: bounded near zero but with occasional spikes above 200 EUR/MWh. Standard loss functions have a structural problem:

  • MSE targets the conditional mean — sits below the median on skewed data, causing systematic underprediction
  • MAE targets the conditional median (50th percentile) — closer, but still tends to undershoot

Quantile loss at q=0.55 targets the 55th percentile, slightly above the median. This directly corrects the underprediction bias without distorting the forecast shape. The 5% shift above the median was chosen empirically to minimize bias on the Spanish OMIE price distribution.

Changed in v4.1. Previously used MAE (v3.1) and MSE (v1.0–v3.0). See the changelog for details.

Training Process

Each model trains independently through the same pipeline:

  1. Feature construction — Build direct features relative to the forecast origin
  2. Time series cross-validation — 5-fold TimeSeriesSplit preserving temporal order
  3. Per-fold training — Train on each fold, evaluate on the next
  4. Final model — Retrain on all available data
  5. Conformal calibration — Build confidence intervals from out-of-fold residuals

Models are saved as joblib artifacts with version timestamps.

Confidence Intervals

The ensemble’s confidence intervals use split conformal prediction with asymmetric bands:

  1. Collect out-of-fold residuals (predicted - actual) from all CV folds
  2. Bucket residuals by horizon group (day buckets)
  3. Compute quantiles of signed residuals:
    • 50% band: 25th and 75th percentiles
    • 90% band: 5th and 95th percentiles
  4. At inference: lower = prediction + quantile_low, upper = prediction + quantile_high

Using signed residuals (rather than absolute residuals) produces asymmetric intervals that reflect the skewed error distribution — wider on the upside where price spikes occur.

Default Hyperparameters

All three models share a common parameter template:

ParameterValuePurpose
max_iter / n_estimators500Number of boosting rounds
max_depth8Tree depth limit
learning_rate0.05Step size shrinkage
min_samples_leaf20Minimum leaf size
l2_regularization0.1L2 penalty weight
early_stoppingTrueStop if validation loss plateaus
validation_fraction0.1Holdout for early stopping
n_iter_no_change20Patience rounds

These can be optimized per horizon group using Optuna hyperparameter tuning.