XGBoost

Overview

XGBoost (Extreme Gradient Boosting) is the gradient boosting component in the v10.1 production model. In the current architecture, it receives 90 tabular features plus 64-dimensional LSTM embeddings (154 features total) and produces the final price forecast. In the legacy v4.3 equal-weight ensemble, XGBoost was one of three equal-weight GB models alongside HistGBM and LightGBM.

Hyperparameters

Parameter	Value	Purpose
`objective`	`reg:quantileerror`	Quantile loss function
`quantile_alpha`	`0.55`	Quantile target (55th percentile)
`n_estimators`	500	Maximum boosting iterations
`max_depth`	8	Maximum tree depth
`learning_rate`	0.05	Shrinkage per iteration
`reg_lambda`	0.1	L2 regularization on leaf values
`random_state`	42	Reproducibility
`tree_method`	`hist`	Histogram-based binning
`device`	`cuda` / `cpu`	GPU acceleration

Level-Wise Growth with Histogram Binning

XGBoost uses level-wise growth (same as HistGBM) but with its own implementation of histogram-based splitting:

tree_method = "hist"

The hist method bins continuous features into discrete buckets before finding optimal splits, reducing the computational cost from O(n × features) to O(bins × features) per split evaluation.

GPU Acceleration

XGBoost supports CUDA-based GPU training:

{"device": "cuda" if gpu_available else "cpu"}

Note the API difference: XGBoost uses device: "cuda" while LightGBM uses device: "gpu". Both achieve similar speedups (3–5×) for the training workload.

Regularization

XGBoost’s regularization differs from the other models:

Regularization	HistGBM	LightGBM	XGBoost
L2 on leaves	Yes (l2_regularization)	Yes (reg_lambda)	Yes (reg_lambda)
L1 on leaves	No	No	Available (reg_alpha)
Max depth	Yes	Yes	Yes
Min samples/leaf	Yes	Yes (min_child_samples)	Available (min_child_weight)
Column subsampling	No	Available	Available
Row subsampling	No	Available	Available

The combination of L1 and L2 regularization gives XGBoost a different effective hypothesis space, producing models that may differ from HistGBM and LightGBM even on the same data.

Ensemble Diversity Contribution

XGBoost’s value in the ensemble comes from:

Different implementation: Even with similar hyperparameters, numerical differences in gradient computation, histogram construction, and split finding lead to slightly different trees
Regularization flexibility: L1 + L2 regularization creates sparser leaf values than pure L2
Proven robustness: Extensive deployment history across forecasting, ranking, and classification domains

Role in v10.1 (Current)

In the current architecture, XGBoost is the only gradient boosting component. It receives the full 154-feature input (90 tabular + 64 LSTM embeddings) and is trained with residual-from-baseline targeting — predicting the deviation from the weekly median price rather than raw EUR/MWh. See Ensemble Strategy for the full architecture.

Legacy: Three-Model Equal-Weight Ensemble (v4.3)

Prior to v10.0, the EPF ensemble combined all three models with equal weights:

ensemble_prediction = (histgbm + lightgbm + xgboost) / 3

Equal weighting was chosen for simplicity and robustness. Learned weights (e.g., optimized per horizon) risk overfitting on small calibration sets and introduce additional hyperparameters.

The ensemble consistently outperformed individual models because:

Individual model errors are partially uncorrelated
Averaging reduces variance without increasing bias
No single model dominated across all hours, horizons, and market conditions

This architecture was replaced by the LSTM-XGBoost hybrid in v10.0. See HistGBM and LightGBM for the other legacy ensemble members.

Feature Importance

XGBoost supports multiple importance methods:

Gain: Total improvement in loss from each feature’s splits
Weight: Number of times each feature appears in splits
Cover: Number of samples affected by each feature’s splits
SHAP values: Game-theoretic feature attribution (most accurate)

Training Process

Identical to the other ensemble members:

Receive training data for a specific horizon group
5-fold TimeSeriesSplit cross-validation
Train with quantile loss objective (q=0.55)
Record per-fold metrics
Collect out-of-fold residuals for conformal calibration
Save model + metadata as joblib artifact