Phase 5 / v6.0 / Z3 Cross-Price Ablation
Date: 2026-04-14 → 2026-04-17 Status: ✅ Production (current)
Summary
A 4-day multi-country feature sprint across all four EPF countries (ES, PT, FR, DE). The sprint added three feature blocks — Z1 country-aware holidays, Z2 generation-forecast targets, Z4 solar elevation — on top of the v2.0 baseline, then ran a controlled ablation (Z3) to test whether the existing cross-country price features were still helping.
The headline result: cross-country price features hurt 3 of 4 countries. Production now gates them to DE only via EPF_CROSS_PRICE_COUNTRIES=DE.
Production promotions
145-day backtest, 2025-11-01 → 2026-03-25, per-country winners:
| Country | DA winner | DA MAE | Strategic winner | ST MAE | Cross-prices |
|---|---|---|---|---|---|
| ES | v12.0 ablation (beats v11.0 14.26) | 13.99 | v11.0 | 17.35 | No |
| PT | v6.0 ablation | 21.94 | v6.0 with Z3 | 24.67 | DA no / ST yes |
| FR | v6.0 ablation | 24.52 | v5.0 (ablation 28.61 close) | 28.47 | No |
| DE | v6.0 with Z3 | 27.64 | v6.0 with Z3 | 35.99 | Yes |
What changed in code
Three new feature blocks
- Z1 country-aware holidays —
_get_country_holidays(country_code)insrc/data/feature_engineering.py. Replaces the earlier hardcoded_is_spanish_holiday(). Per-country public holiday registry insrc/countries.py. - Z2 generation-forecast targets — wind + solar day-ahead forecasts from ENTSO-E, injected as forward-looking features. Code in
src/models/direct_trainer.py::_target_extra_features(). - Z4 solar elevation — computed at country-specific latitude (
_COUNTRY_LATITUDEmap), gives the model a clean proxy for solar generation potential that’s orthogonal to weather noise.
Cross-price gating (Z3)
Cross-country price features (other-country DA prices at 24h/48h/168h lags plus spread-to-ES) are now conditional on EPF_CROSS_PRICE_COUNTRIES. Production default: DE. See cross-price gating decision.
At inference time, direct_predictor.py aborts loudly if a joblib declares cross-price feature columns but the env var doesn’t include this country — a drift guard modelled after the v10.x LSTM zero-fill bug prevention.
Why the Z3 ablation changed our mental model
The v2.0 models (shipped 2026-04-11) all used cross-country prices. The assumption was that in a physically-interconnected market, neighbour prices at short lags should carry information. This proved wrong for 3 of 4 countries:
- ES (−5.71 MAE without Z3): Spain sets the Iberian price. PT/FR/DE prices are downstream effects, already partly encoded in ES’s own lag features. Adding them is correlated noise.
- FR (−0.62): France sits between Iberia and the German-dominated core; neighbour prices correlate with FR but with lags that depend on regime. Lag-based cross-features couldn’t handle the shifting coupling.
- PT (−0.58): PT tracks ES tightly in direction but not in magnitude. Cross-price lags made the model over-weight short-run neighbour differences.
- DE (+0.10, neutral): DE is the EPEX SPOT price-maker; neighbour prices effectively lag DE, carrying residual information.
Other findings from the sprint
PT expanded → hybrid15 was a bug fix (v6.0 DA −9.2%, ST −21%)
PT was using an “expanded” approach that did hourly→15-minute conversion wrong. Switching to hybrid15 with correct resolution handling gave PT a larger jump than any other change. The MAE delta confirms the approach was the dominant issue, not any feature choice.
Optuna on TimeSeriesSplit is rejected for DA
A v6.1 parallel Optuna sweep showed CV MAE −24% (DE DA) but backtest MAE +1.25. The TimeSeriesSplit folds contained 2022 crisis data (prices 400+ EUR/MWh) that doesn’t match the current regime. Optuna optimized for a regime we no longer live in. Future hyperparameter work uses last-2yr windows only.
The ES v11.0 recipe is robust across countries
Single-XGBoost depth=12, learning_rate=0.03, q=0.55, pw3x above 60 EUR, sample decay 365-day halflife — this recipe works well for all four countries when feature noise is controlled. Country-specific tuning is narrower than expected: mainly the target transform (residual_1w for all except FR) and the cross-price gate.
Three process changes that came out of the sprint
- Feature additions must be tested per country. “Universal” features aren’t universal. Phase 5+ mandates per-country ablation before promotion.
- MAE alone is insufficient. Pair with spread capture, spike recall, directional accuracy, and battery arbitrage capture before promoting. DE with Z3 is the MAE best and the economic-utility best; PT’s day-ahead MAE best (ablation) differs from its strategic MAE best (with Z3). Lesson captured in internal memory.
- Experiment-tagging methodology — every new backtest now auto-compares against ES v11.0 + country LATEST + country BEST via
src/models/evaluation.py::get_baselines_for_country(). Mechanical LATEST updates; BEST requires user approval on combined MAE + economic improvement.
Related
- v11.0 post-LSTM correction — prerequisite for this sprint; ES ancestor model
- Multi-Country v2.0 — the baseline that Phase 5 / v6.0 built on
- Cross-price gating decision — the design choice document
- Multi-country architecture — how countries are configured
- XGBoost model page — current production recipe per country