v10.2 — Feature Re-selection + Residual Baseline Variants
Date: March 22, 2026 | Status: Scouted — No Improvement. v10.1 task-aligned confirmed as promotion candidate.
Background
v10.1 identified the task-aligned LSTM encoder as the best model (MAE 15.73, bias −0.65, spike recall 24.1%) but noted a 1.32 MAE gap vs the generic-res1w model (14.41). The gap is concentrated in “boring” base-load hours where the generic encoder’s simpler representation is better calibrated.
v10.2 tests two hypotheses for closing this gap:
-
H1: Feature re-selection with LSTM included — The v4.3 feature selection ran without LSTM features. With 64 LSTM embedding dims added, some tabular features may now be redundant. Re-running selection should remove noise and reduce base-hour variance.
-
H2: Smoother residual baseline —
residual_1wuses a single price point from 7 days ago. A 4-week average or exponentially-weighted mean should provide a smoother, less noisy baseline, helping the model predict stable base-load hours.
Results (150-day Window 2025-10-24 to 2026-03-21)
| Experiment | MAE | Bias | Slope | MaxPred | SpkRec | BasMAE | PkMAE |
|---|---|---|---|---|---|---|---|
| v10.1-ref (task-aligned) | 15.73 | −0.65 | 0.669 | 209 | 24.1% | 14.03 | 16.95 |
| v102-ta-fs (H1: feature select) | 16.90 | −0.82 | 0.639 | 209 | 18.7% | 16.34 | 17.30 |
| v102-ta-4wewm (H2b: EWM 4w) | 18.05 | +3.87 | 0.592 | 175 | 14.6% | 16.80 | 18.94 |
| v102-ta-4w (H2a: 4-week mean) | 18.18 | +3.19 | 0.598 | 190 | 18.9% | 16.20 | 19.58 |
BasMAE = off-peak hours (00:00–07:00, 22:00–23:00). PkMAE = peak hours (08:00–21:00).
All three experiments are worse than v10.1 on every metric.
Finding 1: Feature Selection Destroys LSTM Signal
H1 reduced from 171 features to ~76 (DA1 group) by dropping ~20 individual LSTM embedding dimensions identified as low-importance by permutation tests.
Results vs v10.1: MAE +1.17, bias reverted from −0.65 to −0.82 (back to v7.0 level), spike recall −5.4pp (24.1% → 18.7%), slope −0.030. Both base-hour and peak-hour MAE worsened.
Why it failed: The 64 LSTM embedding dimensions encode a dense latent representation of price trajectory. No single dimension is independently powerful — they collectively describe the market regime, momentum, and shape. Permutation importance evaluates each feature by shuffling it individually while keeping others fixed, which measures isolated importance. For a correlated latent space, isolated importance ≠ collective importance. The “weak” dims that were pruned carry distributional information visible only in context of the full embedding.
This is analogous to compressing a JPEG to 40% quality by deleting the “least important” DCT coefficients individually — each one seems dispensable, but together they define the image sharpness.
Rule established: LSTM embedding dims must be kept as an atomic 64-dim block. Feature selection should only run on the tabular features, with all LSTM dims forced-included.
Finding 2: residual_1w is Optimal for Regime-Switching Markets
H2a (4-week mean) and H2b (EWM with w1=0.4, w2=0.3, w3=0.2, w4=0.1) both caused strong positive bias (+3.19 and +3.87 EUR respectively) and higher MAE.
Why it failed: The evaluation window covers a sharp regime transition: extreme price volatility in Oct–Nov 2025 (170–247 EUR) followed by normal winter prices in Dec–Mar 2026 (~50–80 EUR). A 4-week mean baseline computed for December dates still includes 1–3 weeks of crisis-era prices in its lookback. This inflates the baseline, causing the model to predict large positive residuals even for normal-price days — systematic overprediction.
The 1-week single-point baseline adapts to the current regime within 7 days. The 4-week baseline takes 28 days. In a market with abrupt crisis/recovery transitions, recency beats smoothing.
The EWM variant (w1=0.4 on most recent week) slightly reduces crisis contamination but the bias remains +3.87 because even a single crisis week included with weight 0.3–0.4 is enough to distort the baseline for a full month.
Rule established: For the Spanish DA electricity market with episodic price crises, residual_1w is the optimal transform baseline. Multi-week smoothing is beneficial in stable markets but harmful in regime-switching ones.
Promotion Decision
v10.1 task-aligned encoder (MAE 15.73, bias −0.65, spike recall 24.1%) remains the promotion candidate. See the v10.1 changelog and promotion plan.
Next Experiments
With feature selection and alternative baselines ruled out, the remaining approaches to close the base-hour MAE gap are:
- H3: Conditional ensemble with volatility gate — blend task-aligned (crisis periods) with base-xgb-nopw (calm periods) using a lightweight volatility predictor
- H4: Per-hour quantile schedule — q=0.50 for off-peak, q=0.55 for daytime, q=0.60 for peak risk hours
- H5: French DA electricity price as input feature — EPEX D+1 prices known at origin time, strong predictor via Spain-France interconnection arbitrage
- H6: Gas forward curve (1-month TTF) — captures market expectations of supply shocks before they appear in spot prices