Horizon-Adaptive Calibration

Overview

Forecast uncertainty naturally grows with the prediction horizon — a D+1 forecast has less uncertainty than a D+7 forecast. The EPF system maintains separate calibration distributions for each horizon bucket, allowing confidence interval widths to adapt to the specific uncertainty profile of each prediction distance.

Why Horizon Matters

Consider two forecasts from the same model:

D+1 hour 14 (28 hours ahead):
  Conditions well-constrained by recent data, weather forecast reliable
  Typical error: ±3-5 EUR/MWh
  90% CI width: ~15 EUR/MWh

D+7 hour 14 (172 hours ahead):
  Conditions uncertain, weather forecast degraded
  Typical error: ±8-12 EUR/MWh
  90% CI width: ~35 EUR/MWh

A single global calibration would set the same interval width for both, either too wide for D+1 (overconservative) or too narrow for D+7 (overconfident).

Bucket-Specific Calibration

Each horizon bucket maintains its own residual distribution:

Conformal calibrator:
├── DA1 residuals: [-4.2, 1.5, -2.1, 3.8, ...]  (14-25h ahead)
├── DA2 residuals: [-5.1, 2.3, -3.0, 5.2, ...]  (26-37h ahead)
├── S1 residuals:  [-6.8, 3.1, -4.2, 7.5, ...]  (33-56h ahead)
├── S2 residuals:  [-8.2, 4.5, -5.1, 9.8, ...]  (57-80h ahead)
├── S3 residuals:  [-9.1, 5.2, -6.0, 11.3, ...] (81-104h ahead)
├── S4 residuals:  [-9.8, 5.8, -6.5, 12.1, ...] (105-128h ahead)
└── S5 residuals:  [-11.5, 7.2, -8.0, 15.3, ...] (129-176h ahead)

How Widths Grow

The 90% CI quantiles (5th and 95th percentile of residuals) naturally widen across buckets:

Bucket	5th percentile	95th percentile	CI Width
DA1	-6.5	+8.2	14.7
DA2	-7.8	+10.5	18.3
S1	-9.2	+13.1	22.3
S2	-10.8	+15.5	26.3
S3	-11.5	+17.2	28.7
S4	-12.1	+18.0	30.1
S5	-13.8	+21.5	35.3

(Values are illustrative)

The monotonic widening from DA1 to S5 creates the characteristic “uncertainty funnel” visible on multi-day forecast charts.

Horizon-to-Bucket Mapping

Each forecast hour is mapped to its bucket:

bucket_for_horizon = {
    14: "DA1", 15: "DA1", ..., 25: "DA1",
    26: "DA2", 27: "DA2", ..., 37: "DA2",
    33: "S1",  34: "S1",  ..., 56: "S1",
    # ...
    129: "S5", 130: "S5", ..., 176: "S5",
}

At prediction time, each hour’s forecast is paired with its bucket’s residual distribution to compute the appropriate interval width.

Calibration from Live Predictions

In addition to cross-validation residuals, the calibrator can be rebuilt from live prediction data:

def build_calibrator_from_predictions(predictions_df, min_samples=168):
    valid = predictions_df.dropna(subset=["actual_price", "predicted_price"])

    if len(valid) < min_samples:
        return None  # insufficient data

    residuals = valid["actual_price"] - valid["predicted_price"]
    hours_ahead = compute_hours_ahead(valid)

    calibrator = ConformalCalibrator()
    calibrator.fit(residuals, hours_ahead, horizon_buckets)
    return calibrator

This allows the intervals to adapt to current model accuracy rather than relying solely on historical CV residuals.

Minimum Sample Requirements

Reliable quantile estimation requires a minimum number of residuals per bucket:

Confidence Level	Minimum Samples	Reason
50% CI	50+	25th/75th percentiles need moderate sample
90% CI	168+	5th/95th percentiles are tail estimates, need more data

With fewer samples, quantile estimates are noisy and may produce intervals that are erratically wide or narrow. The system requires at least 168 residuals (one full week of hourly predictions) before generating intervals.

Adaptive Behavior

As the system accumulates more live prediction data, the calibrator can be periodically refreshed to reflect current conditions:

After retraining: New model may have different error characteristics → recalibrate
After market shift: Volatility regime change → recalibrate with recent residuals
Routine maintenance: Monthly recalibration ensures intervals stay aligned with recent accuracy