Ensemble Strategy (Historical)
Retracted: LSTM-XGBoost Hybrid (v10.1)
The day-ahead model in the v10.x series used a task-aligned LSTM encoder that was intended to process 7-day price sequences into 64-dimensional temporal embeddings appended to XGBoost’s tabular features. In practice, bugs in the training and inference paths meant the LSTM block contributed zero useful signal. The measured v10.1 metrics describe broken code, and the headline “DA MAE 12.69 / bias −0.65” numbers that were once quoted here are artifacts of the broken LSTM acting as accidental noise regularization rather than real architectural gains. The configuration is documented in detail on the LSTM page for the historical record; do not use it as guidance for current or future work.
Retired: Equal-Weight Ensemble (v4.3)
The original ensemble combined predictions from three gradient boosting implementations — HistGradientBoosting, LightGBM, and XGBoost — into a single forecast. This remained production from Feb–Mar 2026 and is documented here for reference.
Why Ensemble?
Each gradient boosting implementation uses different algorithmic choices:
| Model | Growth Strategy | Missing Values | Regularization |
|---|---|---|---|
| HistGradientBoosting | Depth-wise | Native NaN support | L2 penalty |
| LightGBM | Leaf-wise | Native NaN support | L1/L2, feature/bagging fractions |
| XGBoost | Level-wise | Learned NaN direction | L1/L2, column sampling |
These differences mean the models make different errors on different samples. When one model struggles with a particular pattern, the others often compensate. Averaging smooths out individual weaknesses.
Averaging Method
The default ensemble uses equal-weight averaging:
ensemble_prediction = (histgb + lightgbm + xgboost) / 3This simple approach is surprisingly effective. In backtesting, the ensemble consistently matches or outperforms the best individual model:
| Product | Ensemble MAE | Best Single Model MAE | Best Single Model |
|---|---|---|---|
| D+1 Day-Ahead | 14.47 | 13.95 | XGBoost |
| D+2–D+7 Strategic | 19.79 | 21.42 | HistGBT |
v4.3 backtest results (Oct 2025 – Feb 2026).
The ensemble consistently outperforms individual models on strategic horizons, where the opposing biases of different models partially cancel out during averaging.
Loss Function: Quantile (q=0.55)
All three models are trained with quantile loss targeting the 55th percentile:
- HistGBT:
loss="quantile", quantile=0.55 - LightGBM:
objective="quantile", alpha=0.55 - XGBoost:
objective="reg:quantileerror", quantile_alpha=0.55
Why Quantile Over MAE/MSE?
Electricity prices are right-skewed: bounded near zero but with occasional spikes above 200 EUR/MWh. Standard loss functions have a structural problem:
- MSE targets the conditional mean — sits below the median on skewed data, causing systematic underprediction
- MAE targets the conditional median (50th percentile) — closer, but still tends to undershoot
Quantile loss at q=0.55 targets the 55th percentile, slightly above the median. This directly corrects the underprediction bias without distorting the forecast shape. The 5% shift above the median was chosen empirically to minimize bias on the Spanish OMIE price distribution.
Changed in v4.1. Previously used MAE (v3.1) and MSE (v1.0–v3.0). See the changelog for details.
Training Process
Each model trains independently through the same pipeline:
- Feature construction — Build direct features relative to the forecast origin
- Time series cross-validation — 5-fold
TimeSeriesSplitpreserving temporal order - Per-fold training — Train on each fold, evaluate on the next
- Final model — Retrain on all available data
- Conformal calibration — Build confidence intervals from out-of-fold residuals
Models are saved as joblib artifacts with version timestamps.
Confidence Intervals
The ensemble’s confidence intervals use split conformal prediction with asymmetric bands:
- Collect out-of-fold residuals (predicted - actual) from all CV folds
- Bucket residuals by horizon group (day buckets)
- Compute quantiles of signed residuals:
- 50% band: 25th and 75th percentiles
- 90% band: 5th and 95th percentiles
- At inference:
lower = prediction + quantile_low,upper = prediction + quantile_high
Using signed residuals (rather than absolute residuals) produces asymmetric intervals that reflect the skewed error distribution — wider on the upside where price spikes occur.
Default Hyperparameters
All three models share a common parameter template:
| Parameter | Value | Purpose |
|---|---|---|
max_iter / n_estimators | 500 | Number of boosting rounds |
max_depth | 8 | Tree depth limit |
learning_rate | 0.05 | Step size shrinkage |
min_samples_leaf | 20 | Minimum leaf size |
l2_regularization | 0.1 | L2 penalty weight |
early_stopping | True | Stop if validation loss plateaus |
validation_fraction | 0.1 | Holdout for early stopping |
n_iter_no_change | 20 | Patience rounds |
These can be optimized per horizon group using Optuna hyperparameter tuning.