Model Evaluation
The Evaluation page provides five analysis tabs for assessing forecast performance. Each tab displays a Summary Panel at the top with auto-generated insight badges (color-coded positive/warning/negative/info) that highlight key takeaways from the data.
Tab 1: Comparison
Summary Panel
Auto-generated badges showing current MAE assessment, comparison vs previous period, and data coverage for the hybrid15 ensemble in live context.
Approach Comparison Chart
Side-by-side ensemble accuracy across resolution approaches (hourly, expanded, pure15, hybrid15) over a selectable date range with MAE confidence bands (±½ MAE inner, ±1 MAE outer). Date range presets: 1D, 1W, 1M, 3M, 6M, 1Y, or custom.
Model Comparison Chart
Individual model predictions (HistGBT, LightGBM, XGBoost, ensemble) overlaid against actual prices for the selected approach.
Tab 2: Accuracy
Summary Panel
Insight badges across 5 categories: Forecast Quality (MAE assessment + trend + skill vs persistence), Bias Pattern (over/underestimation + trend), Error Hotspots (worst hour, worst bias model, best Corr-f), Naive Baseline (persistence and weekly comparisons), and Horizon Degradation (strategic/all only).
The main analysis tab with multiple panels organized in collapsible sections with contextual subtitles:
Section 1: Approach Comparison
- Approach Comparison Table: Ensemble metrics (MAE, RMSE, bias) for each resolution approach side-by-side. Identifies the best-performing approach.
- Approach Timeline: Daily MAE across approaches for comparison.
- Approach Horizon: MAE by forecast day (D+1 through D+7).
Section 2: Error Patterns
- Error by Hour: MAE by hour of day (identifies peak-hour challenges)
- Error by Session: MAE grouped by time-of-day sessions (night, morning, afternoon, evening)
- Error by Weekday: MAE by day of week
- Error Calendar Heatmap: Calendar view of daily MAE values
Section 3: Model Breakdown
- Insights Bar: Automated summary showing the best model, worst hour, worst forecast horizon, most biased model, and best correlation model (Corr-f).
- Model Comparison Table: All models compared: MAE, RMSE, MAPE, bias, Corr-f, and skill scores. Includes trend indicators and collapsible per-horizon breakdown.
- Calibration Panel: Confidence band coverage — compares nominal (50%, 90%) vs actual.
- Accuracy Timeline: Daily MAE over time for the selected model.
- Accuracy Scatter: Model MAE vs naive baseline MAE per day.
- Error by Horizon: MAE at each forecast distance (0-7 days).
Tab 3: Economic
Evaluates forecast quality for trading decisions. See Economic Quality Metrics for metric definitions.
Summary Panel
Insight badges across 4 categories: Price Tracking (Corr-f, within-day shape quality), Error Pattern (Cov-e, direction accuracy), Trading Value (spread capture, spike recall), and Best Models (which models lead on tracking vs trading).
- Approach Economic Table: Per-approach comparison of economic metrics.
- Economic Radar Chart: Multi-axis radar plot comparing economic metric profiles across approaches.
- Economic Metrics Table: All models compared on Corr-f (3 variants), Cov-e, direction accuracy, spike recall, and spread capture. Color-coded thresholds.
- Economic Timeline: Daily chart of a selected economic metric over time with model selector and metric toggle buttons.
- Deviation Scatter: Forecast vs actual price deviations (daily mean removed). Points clustering along the diagonal indicate correct within-day shape prediction.
- Spread Capture Chart: Daily BESS arbitrage efficiency as a percentage of the theoretical maximum. Bars color-coded: green (≥70%), yellow (50–70%), red (below 50%).
Tab 4: Experiments
Compares backtest experiment results to identify the best-performing model version.
Summary Panel
Insight badges across 3 categories: Best Experiment (best tag, MAE, delta vs other selected), vs Baseline (comparison against production tag), and Bias Comparison (lowest absolute bias).
- Experiment Guide (collapsible): Explains what experiment tags are, links to backtesting methodology, and shows a table of all available tags with descriptions. The currently deployed production tag is marked with ★.
- Experiment Summary Table: Side-by-side comparison of selected experiment tags showing MAE, RMSE, bias, Corr-f, and skill scores.
- Experiment Comparison Chart: Time-series comparison of daily MAE across selected experiments.
Tab 5: Features
Summary Panel
Insight badges across 3 categories: Top Features (top 3 drivers with importance %), Category Mix (dominant and secondary feature categories), and Concentration (top-5 concentration % with warning if >70%).
Feature Importance
Ranked list of features by model importance, categorized by type (price, time, weather, demand, commodity).
Feature-Wise Error Analysis
Ensemble MAE binned by feature values: renewable share, residual demand, wind generation, price level, and total demand. Each split into Low/Medium/High bins. Identifies conditions where the model struggles.
Shared Controls
| Control | Options | Description |
|---|---|---|
| Context | Live / Backtest / All | Filter by prediction source |
| Period | 30-90 days | Evaluation window for Accuracy/Features tabs |
| Approach | Hourly / Expanded / Pure15 / Hybrid15 | Resolution strategy filter |
| Run Mode | Day-ahead / Strategic / All | Product type filter |
| Experiment Tag | Available backtest tags | Select experiments to compare (Comparison/Experiments tabs) |
| Production Tag | ★ indicator | Marks the currently deployed model version |