Methodology Changelog
Version History
Version Tagging Convention
Every model experiment gets a version number. Status tags track lifecycle:
| Tag | Meaning |
|---|---|
| (Production) | Currently deployed to production VM |
| (Scout) | Fast exploratory experiment (90-day, single model, DA-only) — not promotion-ready |
| (Validating) | Medium-scope validation (150-day, multiple models) — promotion candidate |
| (Rejected) | Evaluated and not promoted |
| (no tag) | Historical version, superseded |
Model Versions
| Version | Date | Status | Description | DA MAE | Strat MAE |
|---|---|---|---|---|---|
| v10.2 | 2026-03-22 | Rejected | Feature re-selection + residual baseline variants — 3 experiments on 150-day window. Feature selection degraded LSTM model (MAE +1.17, SpkRec −5.4pp): permutation importance wrong for dense embeddings. 4-week baselines caused +3–4 EUR bias: regime-memory bleeds crisis prices into post-crisis forecasts. Two rules established. v10.1 confirmed as best. | 16.90 (ta-fs) | — |
| v10.1 | 2026-03-22 | Promotion Candidate | Task-aligned LSTM encoder + ensemble validation — 18 experiments. Task-aligned encoder (64-dim, predict D+1 full curve) beats v7.0 on every metric on 150-day window: MAE 15.73 (−0.03), bias −0.65 (best ever), MaxPred 209 (+21 EUR), SpkRec 24.1% (+7.9pp), CrisisMAE 27.16. | 15.73 (ta-lstm) | — |
| v10.0 | 2026-03-21 | Scout | LSTM-XGBoost hybrid — LSTM encoder (64-dim) provides temporal embeddings to XGBoost. lstm+res1w: best-ever bias (-1.47), slope (0.725), spike recall (81%), 80-130 MAE (16.3). Overall MAE 13.12 (5% behind XGB 12.48). Promising — needs encoder refinement. | 13.12 (hybrid) | — |
| v9.0 | 2026-03-21 | Scout | Quantile ensemble — 7 experiments: q10/q50/q90 and q25/q55/q75 with various weight combos. Best MAE 14.48 (narrow spread). Didn’t beat single q=0.55 model (12.69). Averaging compressed predictions can’t break compression ceiling. | 14.48 (qens) | — |
| v8.0 | 2026-03-21 | Rejected | PyTorch neural network — 5 experiments: deep residual MLP with BatchNorm, dropout, cosine LR. Best MAE 28.13 (2.2× worse than XGB). Slope 0.39-0.55 (worse). Tabular NNs can’t beat trees on flattened features. | 28.13 (tft) | — |
| v7.2 | 2026-03-21 | Scout | Residual target transform — predict deviation from 1w baseline instead of raw EUR. Slope 0.71→0.75 (+5.5%), bias -51%, 80-130 MAE -26%, MaxPred 135→160. Structural thesis confirmed but overall MAE 12.92 (vs v7.0 12.69). | 12.92 (xgb) | — |
| v7.1 | 2026-03-20 | Rejected | MLP neural network — sklearn MLPRegressor MAE 37+ (3x worse than XGB). Slope 0.26-0.52 (worse, not better). Confirms compression is model-specific but sklearn MLP not viable. | 37.24 (mlp) | — |
| v7.0 | 2026-03-20 | Scout | Compression-breaking experiments — 17 experiments: price weighting, threshold tuning, Monday features. pw-3x-d365-t60 MAE 12.69 (-12.3% vs prod). Spike recall 76%. | 12.69 (xgb) | — |
| v6.3 | 2026-03-19 | Validated | 48-experiment deep trees validation — XGB d12+d365 MAE 13.20 (-8.8% vs prod). Compression analysis: slope 0.70, 80-130 EUR range drives 40% of error. | 13.20 (xgb) | — |
| v6.2 | 2026-03-18 | Scout | 24 structural scout experiments — deep trees winner | 12.55 (xgb) | — |
| v6.1 | 2026-03-18 | Scout | Structural diagnosis — range compression analysis + config scouts | 13.41 (xgb) | — |
| v5.1 | 2026-03-17 | Rejected | Winsorize 200 EUR + decay 365d — bias flipped positive +11.92 | 15.34 (ens) | — |
| v5.0 | 2026-03-11 | Rejected | Peak/off-peak split — halved training data, DA MAE +10.1% worse | — | — |
| v5.0b | 2026-03-17 | Rejected | Log-transform target — compressed range, DA MAE +14.5% worse | — | — |
| v4.3 | 2026-03-05 | Production | Feature selection pipeline | 14.47 (ens) | 19.79 (ens) |
| v4.2 | 2026-03-04 | +24 crisis features | 14.95 (ens) | 20.36 | |
| v4.1 | 2026-03-02 | Quantile loss + weather interactions | 13.42 (ens) | 20.47 | |
| v3.1 | 2026-02-26 | MAE loss + asymmetric conformal | 14.85 | — | |
| v3.0 | 2026-02-25 | Two-product system | — | — | |
| v2.1 | 2026-02-24 | 15-minute resolution | — | — | |
| v2.0 | 2026-02-17 | Multi-model ensemble | — | — | |
| v1.1 | 2026-02-14 | Feature expansion | — | — | |
| v1.0 | 2026-02-12 | Initial release | — | — |
Current production: v4.3 — best strategic MAE (19.79). v7.0 scouts show MAE 12.69 (-12.3%). v10.0 LSTM-XGBoost hybrid achieves best-ever structural metrics (bias -1.47, slope 0.725, spike recall 81%) but MAE 13.12 (5% behind XGB-only). 98+ total experiments across v6.1–v10.0.
Architecture & Data Versions
Versions that changed system architecture, data pipelines, or UI without retraining models.
| Version | Date | Description | Impact |
|---|---|---|---|
| v3.3 | 2026-03-02 | Dashboard restructuring: Price Drivers promoted to top-level section | Cleaner navigation, 11 components deleted |
| v3.2 | 2026-03-02 | Commodity fix (yfinance) + oil prices + Price Drivers redesign | Fixed broken commodity pipeline, added oil data |
Key Transitions
| Transition | Version | What Changed |
|---|---|---|
| Single → Ensemble | v2.0 | Added LightGBM + XGBoost alongside HistGBT |
| Unified → Two-Product | v3.0 | Split into day-ahead (D+1, 10:00 UTC) and strategic (D+2–D+7, 15:00 UTC) |
| MSE → MAE | v3.1 | Switched to absolute error loss; strategic bias -6.94 → -0.30 |
| MAE → Quantile | v4.1 | Quantile loss q=0.55 targets 55th percentile; DA MAE -18.2% |
| Manual → Selected | v4.3 | Two-stage feature selection (correlation + permutation importance) per horizon |
| Shallow → Deep Trees | v6.2 | max_depth 8→12 with regularization; DA MAE -13.3% in scouts |
| Equal → Price-Weighted | v7.0 | High-price samples weighted 3× during training; spike recall 56→76% |
| Raw → Residual Target | v7.2 | Predict deviation from weekly baseline; slope ceiling 0.71→0.75, bias -51% |
| Tabular → LSTM Hybrid | v10.0 | LSTM temporal embeddings augment XGBoost features; best-ever bias (-1.47), spike recall 81% |
Reading the Changelogs
Each version page documents:
- What changed: Technical details of the update
- Why: The problem or limitation being addressed
- Impact: Measurable accuracy improvements or capability additions
- Key files: Source code locations of the changes