v5.0 — Peak/Off-Peak Split (Rejected)
Date: March 11, 2026 | Status: Rejected
Hypothesis
The Spanish OMIE market has structurally different price dynamics across the day. Peak hours (8–21h) are dominated by gas turbines and combined-cycle plants as the marginal price-setter — prices are sensitive to gas costs, demand levels, and the amount of renewable generation that displaces thermal capacity. Off-peak hours (0–7h, 22–23h) are largely set by nuclear and hydroelectric baseload with solar and wind surplus — prices depend on renewable availability and storage economics rather than fuel costs.
A single model trained on all hours simultaneously must learn both regimes and compromise between them. The hypothesis was that two specialized models — one trained exclusively on peak-hour price dynamics, one on off-peak — would each perform better within their respective regime than a general model forced to reconcile both.
What Changed
Peak/Off-Peak Split Training
Instead of one model per horizon group, trained two:
- Peak model: Trained only on samples where the target hour falls in 8–21h
- Off-peak model: Trained only on samples where the target hour falls in 0–7h, 22–23h
At prediction time, the correct model is selected based on the target hour.
Configuration
| Parameter | Value |
|---|---|
| Loss function | Quantile 0.55 |
| Ensemble | 3 models × 2 splits = 6 models per horizon group |
| Features | 46–67 (same as v4.3 after feature selection) |
| Feature selection | Yes |
| Peak-split | Yes (hours 8–21 vs 0–7, 22–23) |
Why It Was Rejected
Evaluation Reliability Issue
During validation, a model selection error was discovered: the system that identifies which trained model to load for a given evaluation run was selecting incorrect model files when peak-split variants were present alongside standard models. This meant backtest results could not be trusted — any comparison between peak-split and standard models was unreliable.
Rather than fixing the model selection logic and re-running the evaluation, the experiment was abandoned in favour of simpler approaches that didn’t require separate model variants.
Architectural Complexity
- Doubled the number of model artifacts (6 per horizon group instead of 3)
- Required model-selection logic at prediction time
- Increased training time ~2x
- Risk of train/serve skew if the peak-hour definition drifts
Uncertain Gain
Without reliable evaluation (due to the glob bug), there was no evidence that peak-split training improved accuracy enough to justify the added complexity. The experiment was abandoned rather than debugged, in favor of Sprint C’s simpler approach.
Lessons Learned
Simpler interventions first: Rather than splitting the model architecture into specialised peak and off-peak variants, the bias problem was better addressed through training data treatment and architectural improvements that don’t require separate model selection at inference time.