Skip to content

Methodology Changelog

Version History

Version Tagging Convention

Every model experiment gets a version number. Status tags track lifecycle:

TagMeaning
(Production)Currently deployed to production VM
(Scout)Fast exploratory experiment (90-day, single model, DA-only) — not promotion-ready
(Validating)Medium-scope validation (150-day, multiple models) — promotion candidate
(Rejected)Evaluated and not promoted
(no tag)Historical version, superseded

Model Versions

VersionDateStatusDescriptionDA MAEStrat MAE
v10.22026-03-22RejectedFeature re-selection + residual baseline variants — 3 experiments on 150-day window. Feature selection degraded LSTM model (MAE +1.17, SpkRec −5.4pp): permutation importance wrong for dense embeddings. 4-week baselines caused +3–4 EUR bias: regime-memory bleeds crisis prices into post-crisis forecasts. Two rules established. v10.1 confirmed as best.16.90 (ta-fs)
v10.12026-03-22Promotion CandidateTask-aligned LSTM encoder + ensemble validation — 18 experiments. Task-aligned encoder (64-dim, predict D+1 full curve) beats v7.0 on every metric on 150-day window: MAE 15.73 (−0.03), bias −0.65 (best ever), MaxPred 209 (+21 EUR), SpkRec 24.1% (+7.9pp), CrisisMAE 27.16.15.73 (ta-lstm)
v10.02026-03-21ScoutLSTM-XGBoost hybrid — LSTM encoder (64-dim) provides temporal embeddings to XGBoost. lstm+res1w: best-ever bias (-1.47), slope (0.725), spike recall (81%), 80-130 MAE (16.3). Overall MAE 13.12 (5% behind XGB 12.48). Promising — needs encoder refinement.13.12 (hybrid)
v9.02026-03-21ScoutQuantile ensemble — 7 experiments: q10/q50/q90 and q25/q55/q75 with various weight combos. Best MAE 14.48 (narrow spread). Didn’t beat single q=0.55 model (12.69). Averaging compressed predictions can’t break compression ceiling.14.48 (qens)
v8.02026-03-21RejectedPyTorch neural network — 5 experiments: deep residual MLP with BatchNorm, dropout, cosine LR. Best MAE 28.13 (2.2× worse than XGB). Slope 0.39-0.55 (worse). Tabular NNs can’t beat trees on flattened features.28.13 (tft)
v7.22026-03-21ScoutResidual target transform — predict deviation from 1w baseline instead of raw EUR. Slope 0.71→0.75 (+5.5%), bias -51%, 80-130 MAE -26%, MaxPred 135→160. Structural thesis confirmed but overall MAE 12.92 (vs v7.0 12.69).12.92 (xgb)
v7.12026-03-20RejectedMLP neural network — sklearn MLPRegressor MAE 37+ (3x worse than XGB). Slope 0.26-0.52 (worse, not better). Confirms compression is model-specific but sklearn MLP not viable.37.24 (mlp)
v7.02026-03-20ScoutCompression-breaking experiments — 17 experiments: price weighting, threshold tuning, Monday features. pw-3x-d365-t60 MAE 12.69 (-12.3% vs prod). Spike recall 76%.12.69 (xgb)
v6.32026-03-19Validated48-experiment deep trees validation — XGB d12+d365 MAE 13.20 (-8.8% vs prod). Compression analysis: slope 0.70, 80-130 EUR range drives 40% of error.13.20 (xgb)
v6.22026-03-18Scout24 structural scout experiments — deep trees winner12.55 (xgb)
v6.12026-03-18ScoutStructural diagnosis — range compression analysis + config scouts13.41 (xgb)
v5.12026-03-17RejectedWinsorize 200 EUR + decay 365d — bias flipped positive +11.9215.34 (ens)
v5.02026-03-11RejectedPeak/off-peak split — halved training data, DA MAE +10.1% worse
v5.0b2026-03-17RejectedLog-transform target — compressed range, DA MAE +14.5% worse
v4.32026-03-05ProductionFeature selection pipeline14.47 (ens)19.79 (ens)
v4.22026-03-04+24 crisis features14.95 (ens)20.36
v4.12026-03-02Quantile loss + weather interactions13.42 (ens)20.47
v3.12026-02-26MAE loss + asymmetric conformal14.85
v3.02026-02-25Two-product system
v2.12026-02-2415-minute resolution
v2.02026-02-17Multi-model ensemble
v1.12026-02-14Feature expansion
v1.02026-02-12Initial release

Current production: v4.3 — best strategic MAE (19.79). v7.0 scouts show MAE 12.69 (-12.3%). v10.0 LSTM-XGBoost hybrid achieves best-ever structural metrics (bias -1.47, slope 0.725, spike recall 81%) but MAE 13.12 (5% behind XGB-only). 98+ total experiments across v6.1–v10.0.

Architecture & Data Versions

Versions that changed system architecture, data pipelines, or UI without retraining models.

VersionDateDescriptionImpact
v3.32026-03-02Dashboard restructuring: Price Drivers promoted to top-level sectionCleaner navigation, 11 components deleted
v3.22026-03-02Commodity fix (yfinance) + oil prices + Price Drivers redesignFixed broken commodity pipeline, added oil data

Key Transitions

TransitionVersionWhat Changed
Single → Ensemblev2.0Added LightGBM + XGBoost alongside HistGBT
Unified → Two-Productv3.0Split into day-ahead (D+1, 10:00 UTC) and strategic (D+2–D+7, 15:00 UTC)
MSE → MAEv3.1Switched to absolute error loss; strategic bias -6.94 → -0.30
MAE → Quantilev4.1Quantile loss q=0.55 targets 55th percentile; DA MAE -18.2%
Manual → Selectedv4.3Two-stage feature selection (correlation + permutation importance) per horizon
Shallow → Deep Treesv6.2max_depth 8→12 with regularization; DA MAE -13.3% in scouts
Equal → Price-Weightedv7.0High-price samples weighted 3× during training; spike recall 56→76%
Raw → Residual Targetv7.2Predict deviation from weekly baseline; slope ceiling 0.71→0.75, bias -51%
Tabular → LSTM Hybridv10.0LSTM temporal embeddings augment XGBoost features; best-ever bias (-1.47), spike recall 81%

Reading the Changelogs

Each version page documents:

  • What changed: Technical details of the update
  • Why: The problem or limitation being addressed
  • Impact: Measurable accuracy improvements or capability additions
  • Key files: Source code locations of the changes