Cross-Country Price Gating (Z3 Ablation)
TL;DR
On 2026-04-17, the Phase 5 v6.0 feature sprint finished with a controlled ablation (Z3) that removed cross-country price features. Removing them improved MAE for ES, FR, and PT day-ahead, and was neutral for DE. Production now gates Z3 to DE only via the EPF_CROSS_PRICE_COUNTRIES=DE env var.
This was the single most impactful result of the 4-day sprint, and it reversed the working assumption we held while shipping multi-country in M1–M5.
Background
The initial PT/FR/DE models (v2.0, shipped 2026-04-11) included a block of “cross-price” features on every country: day-ahead prices from the other three countries at lags of 24h / 48h / 168h, plus ES-as-reference spread features. The intuition was straightforward: the four countries are physically interconnected, MIBEL couples ES+PT tightly, and European market coupling propagates spikes across borders within hours. If the model can see neighbour prices, it should get a lift.
v2.0 shipped and looked fine in isolation. But when the v6.0 sprint added three more feature blocks (Z1 country-aware holidays, Z2 generation-forecast targets, Z4 solar elevation), MAE did not improve — and in fact regressed for ES and FR. That made the cross-price block a suspect: maybe it was contributing noise that previously went unnoticed in the v2.0 baseline.
We ran a controlled ablation: keep Z1/Z2/Z4, remove Z3, retrain each country. Same hyperparameters otherwise.
Results (145-day backtest, 2025-11-01 → 2026-03-25)
| Country | With Z3 (v6.0) | Without Z3 (ablation) | Δ MAE | Interpretation |
|---|---|---|---|---|
| ES | 19.70 | 13.99 | −5.71 | ES sets the Iberian price; PT/FR/DE prices are downstream effects, not leading indicators. Including them adds noise to what should be a clean signal. |
| FR | 25.14 | 24.52 | −0.62 | FR is less physically coupled to Iberia than PT is. Cross-prices added noise without offsetting information gain. |
| PT | 22.52 | 21.94 | −0.58 | Surprising — PT tracks ES tightly in direction, but the magnitudes diverge often enough that cross-price lags introduce noise rather than signal for MAE. |
| DE | 27.64 | 27.74 | +0.10 | Neutral. DE benefits slightly from seeing its European neighbours, which is consistent with EPEX SPOT’s tight coupling. |
ES ablation (13.99) beats v11.0 (14.26) on the same window — the Z1/Z2/Z4 features have real value once the Z3 noise is removed.
Why this happened
The failure mode is well-understood in forecasting once you look for it. Cross-country price features at lag 24h / 48h / 168h tell the model what prices were yesterday in another market. For ES specifically:
- ES sets the MIBEL clearing price, which determines the PT price.
- So “PT yesterday” is essentially a slightly-noised copy of “ES yesterday”.
- The model already has ES-yesterday in its own lag features.
- Adding PT-yesterday is adding correlated noise, not new information.
For PT the failure is more subtle: PT does follow ES tightly in direction, but daily magnitude divergence (when PT’s renewables over/under-produce relative to ES) means that anchoring PT’s prediction to a lagged PT price + cross-price signal leads the model to over-weight the recent neighbour difference. Removing the block lets the model rely more on PT’s own lags and weather/generation features.
For FR the story is about physical coupling: EPEX SPOT France sits between the German-dominated core of European coupling and the Iberian MIBEL. FR prices correlate with both, but with lags that differ depending on what’s driving them. Lag-based cross-features struggle with this shifting-regime coupling.
Germany is the exception because EPEX SPOT Germany is the price-maker for a wide swathe of central Europe — neighbour prices lag German prices, so “what was FR yesterday” carries a residual information signal about where DE is heading.
Implementation
Feature gating is an env-var toggle applied at both training and inference time:
# Production (default)EPF_CROSS_PRICE_COUNTRIES=DEAt training time (src/models/direct_trainer.py), the cross-price feature block is only added when the country being trained is in the list. At inference time (src/models/direct_predictor.py), the predictor refuses to run if the joblib’s feature list contains cross-price columns but the env var doesn’t include this country — a drift guard to prevent silent feature-absence (the same category of bug as the retracted v10.x LSTM zero-fill).
What this means for model development
Three concrete process changes landed because of this:
- Feature additions must be tested per country. “Universal” features aren’t universal — the assumption that “more signal can only help” was wrong for 3 of 4 countries. Phase 5+ mandates per-country ablation.
- MAE alone is insufficient for promotion. DE with Z3 is the MAE best and the economic-utility best; PT without Z3 is the MAE best but PT with Z3 is the strategic economic-utility best. Pair MAE with spread capture / spike recall / directional accuracy / battery arbitrage capture before making a call. See
memory/feedback_mae_alone_lies.mdin the internal repo. - Optuna on TimeSeriesSplit is rejected. A parallel v6.1 Optuna sweep showed CV MAE −24% but backtest MAE +1.25 — the TimeSeriesSplit folds contained 2022 crisis data that doesn’t match the current regime. Future tuning uses the last-2yr window only.
Where this lives
- Code:
src/config.py(env var),src/data/feature_engineering.py(_cross_price_features),src/models/direct_trainer.py+direct_predictor.py(gating) - Internal analysis:
docs/analysis/MULTI_COUNTRY_SOTA.md - Changelog: v11.0 post-LSTM correction for the multi-country production state summary