XGBoost
Overview
XGBoost (Extreme Gradient Boosting) is the gradient boosting component in the v10.1 production model. In the current architecture, it receives 90 tabular features plus 64-dimensional LSTM embeddings (154 features total) and produces the final price forecast. In the legacy v4.3 equal-weight ensemble, XGBoost was one of three equal-weight GB models alongside HistGBM and LightGBM.
Hyperparameters
| Parameter | Value | Purpose |
|---|---|---|
objective | reg:quantileerror | Quantile loss function |
quantile_alpha | 0.55 | Quantile target (55th percentile) |
n_estimators | 500 | Maximum boosting iterations |
max_depth | 8 | Maximum tree depth |
learning_rate | 0.05 | Shrinkage per iteration |
reg_lambda | 0.1 | L2 regularization on leaf values |
random_state | 42 | Reproducibility |
tree_method | hist | Histogram-based binning |
device | cuda / cpu | GPU acceleration |
Level-Wise Growth with Histogram Binning
XGBoost uses level-wise growth (same as HistGBM) but with its own implementation of histogram-based splitting:
tree_method = "hist"The hist method bins continuous features into discrete buckets before finding optimal splits, reducing the computational cost from O(n × features) to O(bins × features) per split evaluation.
GPU Acceleration
XGBoost supports CUDA-based GPU training:
{"device": "cuda" if gpu_available else "cpu"}Note the API difference: XGBoost uses device: "cuda" while LightGBM uses device: "gpu". Both achieve similar speedups (3–5×) for the training workload.
Regularization
XGBoost’s regularization differs from the other models:
| Regularization | HistGBM | LightGBM | XGBoost |
|---|---|---|---|
| L2 on leaves | Yes (l2_regularization) | Yes (reg_lambda) | Yes (reg_lambda) |
| L1 on leaves | No | No | Available (reg_alpha) |
| Max depth | Yes | Yes | Yes |
| Min samples/leaf | Yes | Yes (min_child_samples) | Available (min_child_weight) |
| Column subsampling | No | Available | Available |
| Row subsampling | No | Available | Available |
The combination of L1 and L2 regularization gives XGBoost a different effective hypothesis space, producing models that may differ from HistGBM and LightGBM even on the same data.
Ensemble Diversity Contribution
XGBoost’s value in the ensemble comes from:
- Different implementation: Even with similar hyperparameters, numerical differences in gradient computation, histogram construction, and split finding lead to slightly different trees
- Regularization flexibility: L1 + L2 regularization creates sparser leaf values than pure L2
- Proven robustness: Extensive deployment history across forecasting, ranking, and classification domains
Role in v10.1 (Current)
In the current architecture, XGBoost is the only gradient boosting component. It receives the full 154-feature input (90 tabular + 64 LSTM embeddings) and is trained with residual-from-baseline targeting — predicting the deviation from the weekly median price rather than raw EUR/MWh. See Ensemble Strategy for the full architecture.
Legacy: Three-Model Equal-Weight Ensemble (v4.3)
Prior to v10.0, the EPF ensemble combined all three models with equal weights:
ensemble_prediction = (histgbm + lightgbm + xgboost) / 3Equal weighting was chosen for simplicity and robustness. Learned weights (e.g., optimized per horizon) risk overfitting on small calibration sets and introduce additional hyperparameters.
The ensemble consistently outperformed individual models because:
- Individual model errors are partially uncorrelated
- Averaging reduces variance without increasing bias
- No single model dominated across all hours, horizons, and market conditions
This architecture was replaced by the LSTM-XGBoost hybrid in v10.0. See HistGBM and LightGBM for the other legacy ensemble members.
Feature Importance
XGBoost supports multiple importance methods:
- Gain: Total improvement in loss from each feature’s splits
- Weight: Number of times each feature appears in splits
- Cover: Number of samples affected by each feature’s splits
- SHAP values: Game-theoretic feature attribution (most accurate)
Training Process
Identical to the other ensemble members:
- Receive training data for a specific horizon group
- 5-fold TimeSeriesSplit cross-validation
- Train with quantile loss objective (q=0.55)
- Record per-fold metrics
- Collect out-of-fold residuals for conformal calibration
- Save model + metadata as joblib artifact