Skip to content

Phase 5 / v6.0 / Z3 Cross-Price Ablation

Date: 2026-04-14 → 2026-04-17 Status: ✅ Production (current)

Summary

A 4-day multi-country feature sprint across all four EPF countries (ES, PT, FR, DE). The sprint added three feature blocks — Z1 country-aware holidays, Z2 generation-forecast targets, Z4 solar elevation — on top of the v2.0 baseline, then ran a controlled ablation (Z3) to test whether the existing cross-country price features were still helping.

The headline result: cross-country price features hurt 3 of 4 countries. Production now gates them to DE only via EPF_CROSS_PRICE_COUNTRIES=DE.

Production promotions

145-day backtest, 2025-11-01 → 2026-03-25, per-country winners:

CountryDA winnerDA MAEStrategic winnerST MAECross-prices
ESv12.0 ablation (beats v11.0 14.26)13.99v11.017.35No
PTv6.0 ablation21.94v6.0 with Z324.67DA no / ST yes
FRv6.0 ablation24.52v5.0 (ablation 28.61 close)28.47No
DEv6.0 with Z327.64v6.0 with Z335.99Yes

What changed in code

Three new feature blocks

  • Z1 country-aware holidays_get_country_holidays(country_code) in src/data/feature_engineering.py. Replaces the earlier hardcoded _is_spanish_holiday(). Per-country public holiday registry in src/countries.py.
  • Z2 generation-forecast targets — wind + solar day-ahead forecasts from ENTSO-E, injected as forward-looking features. Code in src/models/direct_trainer.py::_target_extra_features().
  • Z4 solar elevation — computed at country-specific latitude (_COUNTRY_LATITUDE map), gives the model a clean proxy for solar generation potential that’s orthogonal to weather noise.

Cross-price gating (Z3)

Cross-country price features (other-country DA prices at 24h/48h/168h lags plus spread-to-ES) are now conditional on EPF_CROSS_PRICE_COUNTRIES. Production default: DE. See cross-price gating decision.

At inference time, direct_predictor.py aborts loudly if a joblib declares cross-price feature columns but the env var doesn’t include this country — a drift guard modelled after the v10.x LSTM zero-fill bug prevention.

Why the Z3 ablation changed our mental model

The v2.0 models (shipped 2026-04-11) all used cross-country prices. The assumption was that in a physically-interconnected market, neighbour prices at short lags should carry information. This proved wrong for 3 of 4 countries:

  • ES (−5.71 MAE without Z3): Spain sets the Iberian price. PT/FR/DE prices are downstream effects, already partly encoded in ES’s own lag features. Adding them is correlated noise.
  • FR (−0.62): France sits between Iberia and the German-dominated core; neighbour prices correlate with FR but with lags that depend on regime. Lag-based cross-features couldn’t handle the shifting coupling.
  • PT (−0.58): PT tracks ES tightly in direction but not in magnitude. Cross-price lags made the model over-weight short-run neighbour differences.
  • DE (+0.10, neutral): DE is the EPEX SPOT price-maker; neighbour prices effectively lag DE, carrying residual information.

Other findings from the sprint

PT expanded → hybrid15 was a bug fix (v6.0 DA −9.2%, ST −21%)

PT was using an “expanded” approach that did hourly→15-minute conversion wrong. Switching to hybrid15 with correct resolution handling gave PT a larger jump than any other change. The MAE delta confirms the approach was the dominant issue, not any feature choice.

Optuna on TimeSeriesSplit is rejected for DA

A v6.1 parallel Optuna sweep showed CV MAE −24% (DE DA) but backtest MAE +1.25. The TimeSeriesSplit folds contained 2022 crisis data (prices 400+ EUR/MWh) that doesn’t match the current regime. Optuna optimized for a regime we no longer live in. Future hyperparameter work uses last-2yr windows only.

The ES v11.0 recipe is robust across countries

Single-XGBoost depth=12, learning_rate=0.03, q=0.55, pw3x above 60 EUR, sample decay 365-day halflife — this recipe works well for all four countries when feature noise is controlled. Country-specific tuning is narrower than expected: mainly the target transform (residual_1w for all except FR) and the cross-price gate.

Three process changes that came out of the sprint

  1. Feature additions must be tested per country. “Universal” features aren’t universal. Phase 5+ mandates per-country ablation before promotion.
  2. MAE alone is insufficient. Pair with spread capture, spike recall, directional accuracy, and battery arbitrage capture before promoting. DE with Z3 is the MAE best and the economic-utility best; PT’s day-ahead MAE best (ablation) differs from its strategic MAE best (with Z3). Lesson captured in internal memory.
  3. Experiment-tagging methodology — every new backtest now auto-compares against ES v11.0 + country LATEST + country BEST via src/models/evaluation.py::get_baselines_for_country(). Mechanical LATEST updates; BEST requires user approval on combined MAE + economic improvement.