Skip to content

v10.0 — LSTM-XGBoost Hybrid (Scout)

Date: March 21, 2026 | Status: Scout — validated in v10.1

Why This Experiment Exists

After 93+ experiments, we identified that flattened lag columns (price_lag_1h, price_lag_24h, price_lag_168h) destroy temporal structure. Trees recover some via recursive splits but hit a slope ceiling at 0.71. The v8.0 tabular MLP proved NNs can’t replace trees on flat features. The v10.0 hypothesis: NNs should AUGMENT trees, not replace them — an LSTM processes raw price sequences to extract temporal patterns, and XGBoost uses those patterns as extra features alongside its existing tabular data.

Architecture

LSTM Encoder (pre-trained, 168h window)
├── Input: 168 hours of raw prices ending at origin (10:00 UTC)
│ (includes TODAY's full published prices — no gap)
├── Network: 2-layer LSTM, 64 hidden dim
└── Output: 64-dim embedding capturing temporal patterns
XGBoost (existing pipeline, augmented)
├── Input: 90 existing tabular features + 64 LSTM embeddings = 154 features
├── Training: 3+ years history, same CV, same hyperparams
└── Output: predicted price (with access to both temporal AND structural patterns)

Data availability at prediction time

At the dayahead origin (10:00 UTC), the price series contains:

  • All historical prices (fully settled)
  • Today’s full 24h prices (OMIE D+1 results published ~12:00 CET the previous day)

So the LSTM’s 168h window includes today’s complete known price curve. The target is tomorrow 00:00-23:45. There is no gap between known and predicted prices.

What each component learns

ComponentLearnsFromStrength
LSTMTemporal patterns: momentum, regime shifts, weekly shape, today’s curve168h raw price sequenceSequential structure
XGBoostConditional interactions: weather × demand, commodity × hour, season3+ years tabular featuresLong-term structural relationships

What We Tested

5 experiments on 90-day DA backtest (2025-12-18 to 2026-03-18):

#NameLSTMConfigMAEBiasSlope80-130SpkRecMaxPred
1xgb-refNod12+q0.55+pw3x+d36512.48-6.470.71619.768.9%139
2lstm-hybrid-res1wYesd12+q0.55+res1w13.12-1.470.72516.381.0%147
3lstm-hybrid-allYesd12+q0.55+pw3x+d365+res1w13.29-1.490.72616.681.5%146
4lstm-hybrid-pw3xYesd12+q0.55+pw3x+d36514.09-5.000.66220.968.9%131
5lstm-hybrid-baseYesd12+q0.55 only14.24-3.790.65420.266.7%146

Key Findings

1. LSTM + residual_1w = best structural metrics ever

MetricXGB-only (ref)LSTM + res1wChange
Bias-6.47-1.47-77%
Slope0.7160.725+1.3%
80-130 MAE19.716.3-17%
Spike Recall68.9%81.0%+12pp
MaxPred139147+5.8%

2. Price weighting conflicts with LSTM

Adding pw-3x to LSTM configs made them worse (14.09) than without (13.12). The LSTM embeddings already encode price regime information; price weighting creates redundant/conflicting signals.

3. The 5% MAE gap is from low-price accuracy

MAE 13.12 vs 12.48 — the LSTM adds useful signal for peaks but adds noise at baseload hours (0-80 EUR, 66% of data). The LSTM encoder was pre-trained on a simple next-hour prediction task; a task-aligned encoder (predict D+1 full day) should produce richer, more relevant embeddings.

4. Why this differs from v8.0

v8.0 (Failed)v10.0 (Promising)
MLP processes flat featuresLSTM processes raw price sequence
Replaces XGBoost entirelyAugments XGBoost with extra features
Must learn ALL patternsOnly learns temporal patterns
Competes with treesComplements trees

Decision

Scouted — validated at 150-day scale in v10.1. The LSTM-XGBoost hybrid validates the core hypothesis: NNs can augment trees successfully. On the full 150-day validation window, the generic LSTM encoder (generic-res1w) achieves MAE 14.41 — an 8.5% improvement over v7.0 on the same period — confirming the approach is production-worthy.