v10.0 — LSTM-XGBoost Hybrid (Scout)

Date: March 21, 2026 | Status: Scout — validated in v10.1

Why This Experiment Exists

After 93+ experiments, we identified that flattened lag columns (price_lag_1h, price_lag_24h, price_lag_168h) destroy temporal structure. Trees recover some via recursive splits but hit a slope ceiling at 0.71. The v8.0 tabular MLP proved NNs can’t replace trees on flat features. The v10.0 hypothesis: NNs should AUGMENT trees, not replace them — an LSTM processes raw price sequences to extract temporal patterns, and XGBoost uses those patterns as extra features alongside its existing tabular data.

Architecture

LSTM Encoder (pre-trained, 168h window)
  ├── Input: 168 hours of raw prices ending at origin (10:00 UTC)
  │         (includes TODAY's full published prices — no gap)
  ├── Network: 2-layer LSTM, 64 hidden dim
  └── Output: 64-dim embedding capturing temporal patterns

XGBoost (existing pipeline, augmented)
  ├── Input: 90 existing tabular features + 64 LSTM embeddings = 154 features
  ├── Training: 3+ years history, same CV, same hyperparams
  └── Output: predicted price (with access to both temporal AND structural patterns)

Data availability at prediction time

At the dayahead origin (10:00 UTC), the price series contains:

All historical prices (fully settled)
Today’s full 24h prices (OMIE D+1 results published ~12:00 CET the previous day)

So the LSTM’s 168h window includes today’s complete known price curve. The target is tomorrow 00:00-23:45. There is no gap between known and predicted prices.

What each component learns

Component	Learns	From	Strength
LSTM	Temporal patterns: momentum, regime shifts, weekly shape, today’s curve	168h raw price sequence	Sequential structure
XGBoost	Conditional interactions: weather × demand, commodity × hour, season	3+ years tabular features	Long-term structural relationships

What We Tested

5 experiments on 90-day DA backtest (2025-12-18 to 2026-03-18):

#	Name	LSTM	Config	MAE	Bias	Slope	80-130	SpkRec	MaxPred
1	xgb-ref	No	d12+q0.55+pw3x+d365	12.48	-6.47	0.716	19.7	68.9%	139
2	lstm-hybrid-res1w	Yes	d12+q0.55+res1w	13.12	-1.47	0.725	16.3	81.0%	147
3	lstm-hybrid-all	Yes	d12+q0.55+pw3x+d365+res1w	13.29	-1.49	0.726	16.6	81.5%	146
4	lstm-hybrid-pw3x	Yes	d12+q0.55+pw3x+d365	14.09	-5.00	0.662	20.9	68.9%	131
5	lstm-hybrid-base	Yes	d12+q0.55 only	14.24	-3.79	0.654	20.2	66.7%	146

Key Findings

1. LSTM + residual_1w = best structural metrics ever

Metric	XGB-only (ref)	LSTM + res1w	Change
Bias	-6.47	-1.47	-77%
Slope	0.716	0.725	+1.3%
80-130 MAE	19.7	16.3	-17%
Spike Recall	68.9%	81.0%	+12pp
MaxPred	139	147	+5.8%

2. Price weighting conflicts with LSTM

Adding pw-3x to LSTM configs made them worse (14.09) than without (13.12). The LSTM embeddings already encode price regime information; price weighting creates redundant/conflicting signals.

3. The 5% MAE gap is from low-price accuracy

MAE 13.12 vs 12.48 — the LSTM adds useful signal for peaks but adds noise at baseload hours (0-80 EUR, 66% of data). The LSTM encoder was pre-trained on a simple next-hour prediction task; a task-aligned encoder (predict D+1 full day) should produce richer, more relevant embeddings.

4. Why this differs from v8.0

v8.0 (Failed)	v10.0 (Promising)
MLP processes flat features	LSTM processes raw price sequence
Replaces XGBoost entirely	Augments XGBoost with extra features
Must learn ALL patterns	Only learns temporal patterns
Competes with trees	Complements trees

Decision

Scouted — validated at 150-day scale in v10.1. The LSTM-XGBoost hybrid validates the core hypothesis: NNs can augment trees successfully. On the full 150-day validation window, the generic LSTM encoder (generic-res1w) achieves MAE 14.41 — an 8.5% improvement over v7.0 on the same period — confirming the approach is production-worthy.