Economic Quality Metrics
Why Accuracy Isn’t Enough
Traditional forecast metrics — MAE, RMSE, MAPE — measure how close predictions are to actual prices. But trading profits depend on shape and timing, not absolute closeness.
Consider two forecasts for a single day:
- Forecast A: Predicts a flat price of 50 EUR/MWh for all hours. MAE = 8 EUR/MWh.
- Forecast B: Predicts 30 EUR/MWh at night, 70 EUR/MWh at noon. MAE = 12 EUR/MWh.
Forecast A has lower MAE, but Forecast B correctly identifies when prices are cheap and expensive. A BESS operator following Forecast B would charge at night and discharge at noon — generating real arbitrage revenue. Following Forecast A generates zero revenue because it sees no spread.
Economic quality metrics capture this distinction: how useful is the forecast for making money?
Correlation Metrics
Corr-f (Raw) — Price Correlation
The Pearson correlation between actual and predicted prices:
Corr-f (Raw) = Pearson(actual_prices, predicted_prices)| Score | Interpretation |
|---|---|
| ≥ 0.80 | Good — forecast tracks price movements well |
| ≥ 0.60 | Acceptable — captures broad patterns |
| < 0.60 | Poor — forecast misses key movements |
Limitation: Raw correlation is dominated by the daily price level (prices are high in the afternoon, low at night). A forecast that only captures this daily pattern will score well on Corr-f Raw without providing useful within-day differentiation.
Corr-f (Deviation) — Shape Correlation
The primary trading metric. Removes the daily mean from both actual and predicted prices, then correlates the deviations:
For each day d: actual_dev(h) = actual(h) - mean(actual on day d) forecast_dev(h) = forecast(h) - mean(forecast on day d)
Corr-f (Deviation) = Pearson(all actual_dev, all forecast_dev)This isolates within-day shape accuracy: does the forecast correctly identify which hours are above and below the daily average? This is exactly what BESS operators need — they profit from the spread between cheap and expensive hours, regardless of the overall price level.
| Score | Interpretation |
|---|---|
| ≥ 0.80 | Strong shape prediction — high trading value |
| ≥ 0.60 | Moderate shape prediction — some trading signal |
| < 0.60 | Weak shape prediction — limited trading value |
Corr-f (First Difference) — Directional Correlation
Correlates hour-to-hour price changes rather than absolute levels:
Δactual(h) = actual(h) - actual(h-1)Δforecast(h) = forecast(h) - forecast(h-1)
Corr-f (First Diff) = Pearson(Δactual, Δforecast)This measures whether the forecast gets the transitions right: when prices ramp up in the morning, does the forecast also ramp up? When prices drop in the afternoon, does the forecast drop?
Error Quality
Cov-e — Error Covariance
Measures whether forecast errors are random or systematically correlated with price levels:
errors = actual - predictedCov-e = Pearson(actual_prices, errors)| Value | Interpretation |
|---|---|
| ≈ 0 | Ideal — errors are random, no systematic pattern |
| < -0.10 | Systematic underestimate of high prices (bad for traders) |
| > +0.10 | Systematic overestimate of high prices |
A negative Cov-e means the forecast consistently underestimates peaks — the most expensive hours are exactly where the error is largest. This is particularly harmful for trading because it means the forecast undervalues the best opportunities.
Trading Metrics
Direction Accuracy
The percentage of hours where the forecast correctly predicts the direction of price movement:
For each consecutive hour pair: actual_direction = sign(actual(h) - actual(h-1)) forecast_direction = sign(forecast(h) - forecast(h-1))
Direction Accuracy = count(same direction) / total pairs × 100%| Score | Interpretation |
|---|---|
| ≥ 60% | Meaningful directional signal |
| 50–60% | Marginal — barely better than random |
| ≤ 50% | No directional information (coin flip) |
Spike Recall
Measures the forecast’s ability to identify high-value hours — the top 10% most expensive hours in the actual data:
actual_spikes = hours where actual price is in top 10th percentileforecast_spikes = hours where predicted price is in top 10th percentile
Spike Recall = |actual_spikes ∩ forecast_spikes| / |actual_spikes| × 100%A spike recall of 68% means the forecast correctly identifies 68 out of every 100 actual peak-price hours. These are the hours where BESS discharge (selling energy) generates the most revenue, making spike recall directly relevant to profitability.
| Score | Interpretation |
|---|---|
| ≥ 50% | Good — catches most high-value opportunities |
| 30–50% | Moderate — misses significant opportunities |
| < 30% | Poor — near-random spike identification |
Spread Capture
The ultimate BESS-relevant metric: what fraction of the theoretical maximum daily spread does a strategy based on the forecast actually capture?
For each day: 1. Theoretical max spread: - Sort actual prices - Charge during cheapest 4 hours, discharge during most expensive 4 hours - theoretical_spread = mean(top_4) - mean(bottom_4)
2. Forecast-guided spread: - Sort forecast prices - Charge during cheapest 4 hours (by forecast), discharge during most expensive 4 (by forecast) - Look up actual prices at those hours - forecast_spread = mean(actual at forecast top 4) - mean(actual at forecast bottom 4)
Spread Capture = forecast_spread / theoretical_spread × 100%For 15-minute resolution, the system uses 16 charge/discharge slots (4 hours × 4 slots per hour) instead of 4.
| Score | Interpretation |
|---|---|
| ≥ 70% | Strong — captures most available arbitrage |
| 50–70% | Moderate — useful but leaves significant value on the table |
| < 50% | Weak — forecast-guided dispatch barely outperforms random |
Thresholds Summary
| Metric | Good | Acceptable | Poor |
|---|---|---|---|
| Corr-f (Raw) | ≥ 0.80 | 0.60–0.80 | < 0.60 |
| Corr-f (Deviation) | ≥ 0.80 | 0.60–0.80 | < 0.60 |
| Corr-f (First Diff) | ≥ 0.60 | 0.40–0.60 | < 0.40 |
| |Cov-e| | < 0.10 | 0.10–0.20 | > 0.20 |
| Direction Accuracy | ≥ 60% | 50–60% | < 50% |
| Spike Recall | ≥ 50% | 30–50% | < 30% |
| Spread Capture | ≥ 70% | 50–70% | < 50% |
API Access
All economic metrics are available through the evaluation API:
/api/v1/evaluation/economic-metrics— Per-model economic quality metrics (all 7 metrics). Supportsperiod_days,context,approach,run_mode, andtagfilters./api/v1/evaluation/economic-timeline— Daily economic metric time series for a specific model. Requiresmodel_name. Uses 7-day rolling window for Corr-f deviation./api/v1/evaluation/deviation-scatter— Point-level forecast vs actual deviations (daily mean removed). Requiresmodel_name./api/v1/evaluation/comparison— Now includescorr_f_rawper model alongside MAE, RMSE, and bias./api/v1/evaluation/insights— Now includesbest_corr_f_modelandbest_corr_f_value.
See the API Access page for information on programmatic access to these metrics.