v5.0 — Peak/Off-Peak Split (Rejected)

Date: March 11, 2026 | Status: Rejected

Hypothesis

The Spanish OMIE market has structurally different price dynamics across the day. Peak hours (8–21h) are dominated by gas turbines and combined-cycle plants as the marginal price-setter — prices are sensitive to gas costs, demand levels, and the amount of renewable generation that displaces thermal capacity. Off-peak hours (0–7h, 22–23h) are largely set by nuclear and hydroelectric baseload with solar and wind surplus — prices depend on renewable availability and storage economics rather than fuel costs.

A single model trained on all hours simultaneously must learn both regimes and compromise between them. The hypothesis was that two specialized models — one trained exclusively on peak-hour price dynamics, one on off-peak — would each perform better within their respective regime than a general model forced to reconcile both.

What Changed

Peak/Off-Peak Split Training

Instead of one model per horizon group, trained two:

Peak model: Trained only on samples where the target hour falls in 8–21h
Off-peak model: Trained only on samples where the target hour falls in 0–7h, 22–23h

At prediction time, the correct model is selected based on the target hour.

Configuration

Parameter	Value
Loss function	Quantile 0.55
Ensemble	3 models × 2 splits = 6 models per horizon group
Features	46–67 (same as v4.3 after feature selection)
Feature selection	Yes
Peak-split	Yes (hours 8–21 vs 0–7, 22–23)

Why It Was Rejected

Evaluation Reliability Issue

During validation, a model selection error was discovered: the system that identifies which trained model to load for a given evaluation run was selecting incorrect model files when peak-split variants were present alongside standard models. This meant backtest results could not be trusted — any comparison between peak-split and standard models was unreliable.

Rather than fixing the model selection logic and re-running the evaluation, the experiment was abandoned in favour of simpler approaches that didn’t require separate model variants.

Architectural Complexity

Doubled the number of model artifacts (6 per horizon group instead of 3)
Required model-selection logic at prediction time
Increased training time ~2x
Risk of train/serve skew if the peak-hour definition drifts

Uncertain Gain

Without reliable evaluation (due to the glob bug), there was no evidence that peak-split training improved accuracy enough to justify the added complexity. The experiment was abandoned rather than debugged, in favor of Sprint C’s simpler approach.

Lessons Learned

Simpler interventions first: Rather than splitting the model architecture into specialised peak and off-peak variants, the bias problem was better addressed through training data treatment and architectural improvements that don’t require separate model selection at inference time.