Cross-Country Price Gating (Z3 Ablation)

TL;DR

On 2026-04-17, the Phase 5 v6.0 feature sprint finished with a controlled ablation (Z3) that removed cross-country price features. Removing them improved MAE for ES, FR, and PT day-ahead, and was neutral for DE. Production now gates Z3 to DE only via the EPF_CROSS_PRICE_COUNTRIES=DE env var.

This was the single most impactful result of the 4-day sprint, and it reversed the working assumption we held while shipping multi-country in M1–M5.

Background

The initial PT/FR/DE models (v2.0, shipped 2026-04-11) included a block of “cross-price” features on every country: day-ahead prices from the other three countries at lags of 24h / 48h / 168h, plus ES-as-reference spread features. The intuition was straightforward: the four countries are physically interconnected, MIBEL couples ES+PT tightly, and European market coupling propagates spikes across borders within hours. If the model can see neighbour prices, it should get a lift.

v2.0 shipped and looked fine in isolation. But when the v6.0 sprint added three more feature blocks (Z1 country-aware holidays, Z2 generation-forecast targets, Z4 solar elevation), MAE did not improve — and in fact regressed for ES and FR. That made the cross-price block a suspect: maybe it was contributing noise that previously went unnoticed in the v2.0 baseline.

We ran a controlled ablation: keep Z1/Z2/Z4, remove Z3, retrain each country. Same hyperparameters otherwise.

Results (145-day backtest, 2025-11-01 → 2026-03-25)

Country	With Z3 (v6.0)	Without Z3 (ablation)	Δ MAE	Interpretation
ES	19.70	13.99	−5.71	ES sets the Iberian price; PT/FR/DE prices are downstream effects, not leading indicators. Including them adds noise to what should be a clean signal.
FR	25.14	24.52	−0.62	FR is less physically coupled to Iberia than PT is. Cross-prices added noise without offsetting information gain.
PT	22.52	21.94	−0.58	Surprising — PT tracks ES tightly in direction, but the magnitudes diverge often enough that cross-price lags introduce noise rather than signal for MAE.
DE	27.64	27.74	+0.10	Neutral. DE benefits slightly from seeing its European neighbours, which is consistent with EPEX SPOT’s tight coupling.

ES ablation (13.99) beats v11.0 (14.26) on the same window — the Z1/Z2/Z4 features have real value once the Z3 noise is removed.

Why this happened

The failure mode is well-understood in forecasting once you look for it. Cross-country price features at lag 24h / 48h / 168h tell the model what prices were yesterday in another market. For ES specifically:

ES sets the MIBEL clearing price, which determines the PT price.
So “PT yesterday” is essentially a slightly-noised copy of “ES yesterday”.
The model already has ES-yesterday in its own lag features.
Adding PT-yesterday is adding correlated noise, not new information.

For PT the failure is more subtle: PT does follow ES tightly in direction, but daily magnitude divergence (when PT’s renewables over/under-produce relative to ES) means that anchoring PT’s prediction to a lagged PT price + cross-price signal leads the model to over-weight the recent neighbour difference. Removing the block lets the model rely more on PT’s own lags and weather/generation features.

For FR the story is about physical coupling: EPEX SPOT France sits between the German-dominated core of European coupling and the Iberian MIBEL. FR prices correlate with both, but with lags that differ depending on what’s driving them. Lag-based cross-features struggle with this shifting-regime coupling.

Germany is the exception because EPEX SPOT Germany is the price-maker for a wide swathe of central Europe — neighbour prices lag German prices, so “what was FR yesterday” carries a residual information signal about where DE is heading.

Implementation

Feature gating is an env-var toggle applied at both training and inference time:

# Production (default)
EPF_CROSS_PRICE_COUNTRIES=DE

At training time (src/models/direct_trainer.py), the cross-price feature block is only added when the country being trained is in the list. At inference time (src/models/direct_predictor.py), the predictor refuses to run if the joblib’s feature list contains cross-price columns but the env var doesn’t include this country — a drift guard to prevent silent feature-absence (the same category of bug as the retracted v10.x LSTM zero-fill).

What this means for model development

Three concrete process changes landed because of this:

Feature additions must be tested per country. “Universal” features aren’t universal — the assumption that “more signal can only help” was wrong for 3 of 4 countries. Phase 5+ mandates per-country ablation.
MAE alone is insufficient for promotion. DE with Z3 is the MAE best and the economic-utility best; PT without Z3 is the MAE best but PT with Z3 is the strategic economic-utility best. Pair MAE with spread capture / spike recall / directional accuracy / battery arbitrage capture before making a call. See memory/feedback_mae_alone_lies.md in the internal repo.
Optuna on TimeSeriesSplit is rejected. A parallel v6.1 Optuna sweep showed CV MAE −24% but backtest MAE +1.25 — the TimeSeriesSplit folds contained 2022 crisis data that doesn’t match the current regime. Future tuning uses the last-2yr window only.

Where this lives

Code: src/config.py (env var), src/data/feature_engineering.py (_cross_price_features), src/models/direct_trainer.py + direct_predictor.py (gating)
Internal analysis: docs/analysis/MULTI_COUNTRY_SOTA.md
Changelog: v11.0 post-LSTM correction for the multi-country production state summary