Skip to content

XGBoost

Overview

XGBoost (Extreme Gradient Boosting) is the gradient boosting component in the v10.1 production model. In the current architecture, it receives 90 tabular features plus 64-dimensional LSTM embeddings (154 features total) and produces the final price forecast. In the legacy v4.3 equal-weight ensemble, XGBoost was one of three equal-weight GB models alongside HistGBM and LightGBM.

Hyperparameters

ParameterValuePurpose
objectivereg:quantileerrorQuantile loss function
quantile_alpha0.55Quantile target (55th percentile)
n_estimators500Maximum boosting iterations
max_depth8Maximum tree depth
learning_rate0.05Shrinkage per iteration
reg_lambda0.1L2 regularization on leaf values
random_state42Reproducibility
tree_methodhistHistogram-based binning
devicecuda / cpuGPU acceleration

Level-Wise Growth with Histogram Binning

XGBoost uses level-wise growth (same as HistGBM) but with its own implementation of histogram-based splitting:

tree_method = "hist"

The hist method bins continuous features into discrete buckets before finding optimal splits, reducing the computational cost from O(n × features) to O(bins × features) per split evaluation.

GPU Acceleration

XGBoost supports CUDA-based GPU training:

{"device": "cuda" if gpu_available else "cpu"}

Note the API difference: XGBoost uses device: "cuda" while LightGBM uses device: "gpu". Both achieve similar speedups (3–5×) for the training workload.

Regularization

XGBoost’s regularization differs from the other models:

RegularizationHistGBMLightGBMXGBoost
L2 on leavesYes (l2_regularization)Yes (reg_lambda)Yes (reg_lambda)
L1 on leavesNoNoAvailable (reg_alpha)
Max depthYesYesYes
Min samples/leafYesYes (min_child_samples)Available (min_child_weight)
Column subsamplingNoAvailableAvailable
Row subsamplingNoAvailableAvailable

The combination of L1 and L2 regularization gives XGBoost a different effective hypothesis space, producing models that may differ from HistGBM and LightGBM even on the same data.

Ensemble Diversity Contribution

XGBoost’s value in the ensemble comes from:

  1. Different implementation: Even with similar hyperparameters, numerical differences in gradient computation, histogram construction, and split finding lead to slightly different trees
  2. Regularization flexibility: L1 + L2 regularization creates sparser leaf values than pure L2
  3. Proven robustness: Extensive deployment history across forecasting, ranking, and classification domains

Role in v10.1 (Current)

In the current architecture, XGBoost is the only gradient boosting component. It receives the full 154-feature input (90 tabular + 64 LSTM embeddings) and is trained with residual-from-baseline targeting — predicting the deviation from the weekly median price rather than raw EUR/MWh. See Ensemble Strategy for the full architecture.

Legacy: Three-Model Equal-Weight Ensemble (v4.3)

Prior to v10.0, the EPF ensemble combined all three models with equal weights:

ensemble_prediction = (histgbm + lightgbm + xgboost) / 3

Equal weighting was chosen for simplicity and robustness. Learned weights (e.g., optimized per horizon) risk overfitting on small calibration sets and introduce additional hyperparameters.

The ensemble consistently outperformed individual models because:

  • Individual model errors are partially uncorrelated
  • Averaging reduces variance without increasing bias
  • No single model dominated across all hours, horizons, and market conditions

This architecture was replaced by the LSTM-XGBoost hybrid in v10.0. See HistGBM and LightGBM for the other legacy ensemble members.

Feature Importance

XGBoost supports multiple importance methods:

  • Gain: Total improvement in loss from each feature’s splits
  • Weight: Number of times each feature appears in splits
  • Cover: Number of samples affected by each feature’s splits
  • SHAP values: Game-theoretic feature attribution (most accurate)

Training Process

Identical to the other ensemble members:

  1. Receive training data for a specific horizon group
  2. 5-fold TimeSeriesSplit cross-validation
  3. Train with quantile loss objective (q=0.55)
  4. Record per-fold metrics
  5. Collect out-of-fold residuals for conformal calibration
  6. Save model + metadata as joblib artifact