Economic Quality Metrics

Why Accuracy Isn’t Enough

Traditional forecast metrics — MAE, RMSE, MAPE — measure how close predictions are to actual prices. But trading profits depend on shape and timing, not absolute closeness.

Consider two forecasts for a single day:

Forecast A: Predicts a flat price of 50 EUR/MWh for all hours. MAE = 8 EUR/MWh.
Forecast B: Predicts 30 EUR/MWh at night, 70 EUR/MWh at noon. MAE = 12 EUR/MWh.

Forecast A has lower MAE, but Forecast B correctly identifies when prices are cheap and expensive. A BESS operator following Forecast B would charge at night and discharge at noon — generating real arbitrage revenue. Following Forecast A generates zero revenue because it sees no spread.

Economic quality metrics capture this distinction: how useful is the forecast for making money?

Correlation Metrics

Corr-f (Raw) — Price Correlation

The Pearson correlation between actual and predicted prices:

Corr-f (Raw) = Pearson(actual_prices, predicted_prices)

Score	Interpretation
≥ 0.80	Good — forecast tracks price movements well
≥ 0.60	Acceptable — captures broad patterns
< 0.60	Poor — forecast misses key movements

Limitation: Raw correlation is dominated by the daily price level (prices are high in the afternoon, low at night). A forecast that only captures this daily pattern will score well on Corr-f Raw without providing useful within-day differentiation.

Corr-f (Deviation) — Shape Correlation

The primary trading metric. Removes the daily mean from both actual and predicted prices, then correlates the deviations:

For each day d:
  actual_dev(h)    = actual(h) - mean(actual on day d)
  forecast_dev(h)  = forecast(h) - mean(forecast on day d)

Corr-f (Deviation) = Pearson(all actual_dev, all forecast_dev)

This isolates within-day shape accuracy: does the forecast correctly identify which hours are above and below the daily average? This is exactly what BESS operators need — they profit from the spread between cheap and expensive hours, regardless of the overall price level.

Score	Interpretation
≥ 0.80	Strong shape prediction — high trading value
≥ 0.60	Moderate shape prediction — some trading signal
< 0.60	Weak shape prediction — limited trading value

Corr-f (First Difference) — Directional Correlation

Correlates hour-to-hour price changes rather than absolute levels:

Δactual(h) = actual(h) - actual(h-1)
Δforecast(h) = forecast(h) - forecast(h-1)

Corr-f (First Diff) = Pearson(Δactual, Δforecast)

This measures whether the forecast gets the transitions right: when prices ramp up in the morning, does the forecast also ramp up? When prices drop in the afternoon, does the forecast drop?

Error Quality

Cov-e — Error Covariance

Measures whether forecast errors are random or systematically correlated with price levels:

errors = actual - predicted
Cov-e = Pearson(actual_prices, errors)

Value	Interpretation
≈ 0	Ideal — errors are random, no systematic pattern
< -0.10	Systematic underestimate of high prices (bad for traders)
> +0.10	Systematic overestimate of high prices

A negative Cov-e means the forecast consistently underestimates peaks — the most expensive hours are exactly where the error is largest. This is particularly harmful for trading because it means the forecast undervalues the best opportunities.

Trading Metrics

Direction Accuracy

The percentage of hours where the forecast correctly predicts the direction of price movement:

For each consecutive hour pair:
  actual_direction = sign(actual(h) - actual(h-1))
  forecast_direction = sign(forecast(h) - forecast(h-1))

Direction Accuracy = count(same direction) / total pairs × 100%

Score	Interpretation
≥ 60%	Meaningful directional signal
50–60%	Marginal — barely better than random
≤ 50%	No directional information (coin flip)

Spike Recall

Measures the forecast’s ability to identify high-value hours — the top 10% most expensive hours in the actual data:

actual_spikes = hours where actual price is in top 10th percentile
forecast_spikes = hours where predicted price is in top 10th percentile

Spike Recall = |actual_spikes ∩ forecast_spikes| / |actual_spikes| × 100%

A spike recall of 68% means the forecast correctly identifies 68 out of every 100 actual peak-price hours. These are the hours where BESS discharge (selling energy) generates the most revenue, making spike recall directly relevant to profitability.

Score	Interpretation
≥ 50%	Good — catches most high-value opportunities
30–50%	Moderate — misses significant opportunities
< 30%	Poor — near-random spike identification

Spread Capture

The ultimate BESS-relevant metric: what fraction of the theoretical maximum daily spread does a strategy based on the forecast actually capture?

For each day:
  1. Theoretical max spread:
     - Sort actual prices
     - Charge during cheapest 4 hours, discharge during most expensive 4 hours
     - theoretical_spread = mean(top_4) - mean(bottom_4)

  2. Forecast-guided spread:
     - Sort forecast prices
     - Charge during cheapest 4 hours (by forecast), discharge during most expensive 4 (by forecast)
     - Look up actual prices at those hours
     - forecast_spread = mean(actual at forecast top 4) - mean(actual at forecast bottom 4)

  Spread Capture = forecast_spread / theoretical_spread × 100%

For 15-minute resolution, the system uses 16 charge/discharge slots (4 hours × 4 slots per hour) instead of 4.

Score	Interpretation
≥ 70%	Strong — captures most available arbitrage
50–70%	Moderate — useful but leaves significant value on the table
< 50%	Weak — forecast-guided dispatch barely outperforms random

Thresholds Summary

Metric	Good	Acceptable	Poor
Corr-f (Raw)	≥ 0.80	0.60–0.80	< 0.60
Corr-f (Deviation)	≥ 0.80	0.60–0.80	< 0.60
Corr-f (First Diff)	≥ 0.60	0.40–0.60	< 0.40
\|Cov-e\|	< 0.10	0.10–0.20	> 0.20
Direction Accuracy	≥ 60%	50–60%	< 50%
Spike Recall	≥ 50%	30–50%	< 30%
Spread Capture	≥ 70%	50–70%	< 50%

API Access

All economic metrics are available through the evaluation API:

/api/v1/evaluation/economic-metrics — Per-model economic quality metrics (all 7 metrics). Supports period_days, context, approach, run_mode, and tag filters.
/api/v1/evaluation/economic-timeline — Daily economic metric time series for a specific model. Requires model_name. Uses 7-day rolling window for Corr-f deviation.
/api/v1/evaluation/deviation-scatter — Point-level forecast vs actual deviations (daily mean removed). Requires model_name.
/api/v1/evaluation/comparison — Now includes corr_f_raw per model alongside MAE, RMSE, and bias.
/api/v1/evaluation/insights — Now includes best_corr_f_model and best_corr_f_value.

See the API Access page for information on programmatic access to these metrics.