v10.0 — LSTM-XGBoost Hybrid (Scout)
Date: March 21, 2026 | Status: Scout — validated in v10.1
Why This Experiment Exists
After 93+ experiments, we identified that flattened lag columns (price_lag_1h, price_lag_24h, price_lag_168h) destroy temporal structure. Trees recover some via recursive splits but hit a slope ceiling at 0.71. The v8.0 tabular MLP proved NNs can’t replace trees on flat features. The v10.0 hypothesis: NNs should AUGMENT trees, not replace them — an LSTM processes raw price sequences to extract temporal patterns, and XGBoost uses those patterns as extra features alongside its existing tabular data.
Architecture
LSTM Encoder (pre-trained, 168h window) ├── Input: 168 hours of raw prices ending at origin (10:00 UTC) │ (includes TODAY's full published prices — no gap) ├── Network: 2-layer LSTM, 64 hidden dim └── Output: 64-dim embedding capturing temporal patterns
XGBoost (existing pipeline, augmented) ├── Input: 90 existing tabular features + 64 LSTM embeddings = 154 features ├── Training: 3+ years history, same CV, same hyperparams └── Output: predicted price (with access to both temporal AND structural patterns)Data availability at prediction time
At the dayahead origin (10:00 UTC), the price series contains:
- All historical prices (fully settled)
- Today’s full 24h prices (OMIE D+1 results published ~12:00 CET the previous day)
So the LSTM’s 168h window includes today’s complete known price curve. The target is tomorrow 00:00-23:45. There is no gap between known and predicted prices.
What each component learns
| Component | Learns | From | Strength |
|---|---|---|---|
| LSTM | Temporal patterns: momentum, regime shifts, weekly shape, today’s curve | 168h raw price sequence | Sequential structure |
| XGBoost | Conditional interactions: weather × demand, commodity × hour, season | 3+ years tabular features | Long-term structural relationships |
What We Tested
5 experiments on 90-day DA backtest (2025-12-18 to 2026-03-18):
| # | Name | LSTM | Config | MAE | Bias | Slope | 80-130 | SpkRec | MaxPred |
|---|---|---|---|---|---|---|---|---|---|
| 1 | xgb-ref | No | d12+q0.55+pw3x+d365 | 12.48 | -6.47 | 0.716 | 19.7 | 68.9% | 139 |
| 2 | lstm-hybrid-res1w | Yes | d12+q0.55+res1w | 13.12 | -1.47 | 0.725 | 16.3 | 81.0% | 147 |
| 3 | lstm-hybrid-all | Yes | d12+q0.55+pw3x+d365+res1w | 13.29 | -1.49 | 0.726 | 16.6 | 81.5% | 146 |
| 4 | lstm-hybrid-pw3x | Yes | d12+q0.55+pw3x+d365 | 14.09 | -5.00 | 0.662 | 20.9 | 68.9% | 131 |
| 5 | lstm-hybrid-base | Yes | d12+q0.55 only | 14.24 | -3.79 | 0.654 | 20.2 | 66.7% | 146 |
Key Findings
1. LSTM + residual_1w = best structural metrics ever
| Metric | XGB-only (ref) | LSTM + res1w | Change |
|---|---|---|---|
| Bias | -6.47 | -1.47 | -77% |
| Slope | 0.716 | 0.725 | +1.3% |
| 80-130 MAE | 19.7 | 16.3 | -17% |
| Spike Recall | 68.9% | 81.0% | +12pp |
| MaxPred | 139 | 147 | +5.8% |
2. Price weighting conflicts with LSTM
Adding pw-3x to LSTM configs made them worse (14.09) than without (13.12). The LSTM embeddings already encode price regime information; price weighting creates redundant/conflicting signals.
3. The 5% MAE gap is from low-price accuracy
MAE 13.12 vs 12.48 — the LSTM adds useful signal for peaks but adds noise at baseload hours (0-80 EUR, 66% of data). The LSTM encoder was pre-trained on a simple next-hour prediction task; a task-aligned encoder (predict D+1 full day) should produce richer, more relevant embeddings.
4. Why this differs from v8.0
| v8.0 (Failed) | v10.0 (Promising) |
|---|---|
| MLP processes flat features | LSTM processes raw price sequence |
| Replaces XGBoost entirely | Augments XGBoost with extra features |
| Must learn ALL patterns | Only learns temporal patterns |
| Competes with trees | Complements trees |
Decision
Scouted — validated at 150-day scale in v10.1. The LSTM-XGBoost hybrid validates the core hypothesis: NNs can augment trees successfully. On the full 150-day validation window, the generic LSTM encoder (generic-res1w) achieves MAE 14.41 — an 8.5% improvement over v7.0 on the same period — confirming the approach is production-worthy.