Metrics

Primary Metrics

MAE (Mean Absolute Error)

The primary evaluation metric, measured in EUR/MWh:

MAE = (1/n) × Σ |actual_price - predicted_price|

MAE represents the average magnitude of forecast errors, regardless of direction. A MAE of 3.5 EUR/MWh means predictions are, on average, 3.50 EUR off from the actual price.

Why primary: MAE is robust to outliers, directly interpretable in market terms, and consistent with the MAE loss function used for training.

RMSE (Root Mean Squared Error)

The secondary metric, also in EUR/MWh:

RMSE = √[(1/n) × Σ (actual_price - predicted_price)²]

RMSE penalizes large errors more heavily than MAE. The ratio RMSE/MAE indicates error distribution shape:

RMSE/MAE ≈ 1.0: Errors are uniform (all similar magnitude)
RMSE/MAE ≈ 1.25: Normal distribution of errors (typical)
RMSE/MAE > 1.5: Heavy-tailed errors (occasional large misses)

MAPE (Mean Absolute Percentage Error)

A normalized metric expressed as a percentage:

MAPE = (100/n) × Σ |actual_price - predicted_price| / |actual_price|

Important: MAPE is only computed for hours where |actual_price| > 1 EUR/MWh. Near-zero prices create division-by-near-zero artifacts that inflate MAPE to meaningless values. Hours with |price| ≤ 1 are excluded from the calculation.

Mean Error (ME / Bias)

The directional error metric:

ME = (1/n) × Σ (predicted_price - actual_price)

ME > 0: Systematic overprediction
ME < 0: Systematic underprediction
ME ≈ 0: No directional bias

ME is used by the bias correction system to detect and compensate for systematic errors.

Skill Scores

Skill scores benchmark model accuracy against naive baselines:

Skill = 1 - (MAE_model / MAE_baseline)

Score	Interpretation
1.0	Perfect forecast (zero error)
> 0	Model outperforms the baseline
0.0	Model equals the baseline
< 0	Model is worse than the baseline

Naive Persistence

prediction(t) = actual_price(t - 24h)

Tomorrow’s price at hour H = today’s price at hour H. This captures the strong daily autocorrelation in electricity prices.

Naive Weekly

prediction(t) = actual_price(t - 168h)

Next week’s price at hour H = this week’s price at hour H. This captures weekly seasonality (weekday vs weekend patterns).

Metric Dimensions

Metrics are computed across multiple dimensions to reveal performance patterns:

By Hour of Day

Reveals the daily accuracy profile. Typical patterns:

Night hours (0–6): Lowest MAE (stable, predictable prices)
Morning ramp (7–9): Higher MAE (demand uncertainty)
Peak hours (10–14): Highest MAE (solar variability, demand peak)
Evening (18–21): Moderate MAE (demand decline, solar dropout)

By Day of Week

Weekend prices are more stable (lower industrial demand) and typically easier to predict. Monday morning is often the hardest due to the weekend-to-weekday transition.

By Horizon

Accuracy degrades with horizon — D+1 predictions are more accurate than D+7 predictions. The degradation rate indicates how quickly forecast skill decays.

By Model

Comparing HistGradientBoosting, LightGBM, and XGBoost reveals which models handle which conditions best. The ensemble typically outperforms all individual models.

Economic Quality Metrics

Beyond statistical accuracy, EPF tracks metrics that directly measure forecast value for trading decisions. These include price shape correlation (Corr-f in 3 variants), error quality (Cov-e), direction accuracy, spike recall, and spread capture.

See Economic Quality Metrics for detailed definitions, formulas, and interpretation guidelines.

API Access

All metrics are available through the evaluation endpoints:

/api/v1/evaluation/summary — Aggregated MAE, RMSE, MAPE per model
/api/v1/evaluation/timeline — Daily MAE evolution
/api/v1/evaluation/dimensions — Breakdown by hour, weekday, horizon
/api/v1/evaluation/comparison — Side-by-side model comparison (includes Corr-f)
/api/v1/evaluation/economic-metrics — Per-model economic quality metrics
/api/v1/evaluation/economic-timeline — Daily economic metric time series
/api/v1/evaluation/deviation-scatter — Forecast vs actual deviation data