Model Evaluation

The Evaluation page provides five analysis tabs for assessing forecast performance. Each tab displays a Summary Panel at the top with auto-generated insight badges (color-coded positive/warning/negative/info) that highlight key takeaways from the data.

Tab 1: Comparison

Summary Panel

Auto-generated badges showing current MAE assessment, comparison vs previous period, and data coverage for the hybrid15 ensemble in live context.

Approach Comparison Chart

Side-by-side ensemble accuracy across resolution approaches (hourly, expanded, pure15, hybrid15) over a selectable date range with MAE confidence bands (±½ MAE inner, ±1 MAE outer). Date range presets: 1D, 1W, 1M, 3M, 6M, 1Y, or custom.

Model Comparison Chart

Individual model predictions (HistGBT, LightGBM, XGBoost, ensemble) overlaid against actual prices for the selected approach.

Tab 2: Accuracy

Summary Panel

Insight badges across 5 categories: Forecast Quality (MAE assessment + trend + skill vs persistence), Bias Pattern (over/underestimation + trend), Error Hotspots (worst hour, worst bias model, best Corr-f), Naive Baseline (persistence and weekly comparisons), and Horizon Degradation (strategic/all only).

The main analysis tab with multiple panels organized in collapsible sections with contextual subtitles:

Section 1: Approach Comparison

Approach Comparison Table: Ensemble metrics (MAE, RMSE, bias) for each resolution approach side-by-side. Identifies the best-performing approach.
Approach Timeline: Daily MAE across approaches for comparison.
Approach Horizon: MAE by forecast day (D+1 through D+7).

Section 2: Error Patterns

Error by Hour: MAE by hour of day (identifies peak-hour challenges)
Error by Session: MAE grouped by time-of-day sessions (night, morning, afternoon, evening)
Error by Weekday: MAE by day of week
Error Calendar Heatmap: Calendar view of daily MAE values

Section 3: Model Breakdown

Insights Bar: Automated summary showing the best model, worst hour, worst forecast horizon, most biased model, and best correlation model (Corr-f).
Model Comparison Table: All models compared: MAE, RMSE, MAPE, bias, Corr-f, and skill scores. Includes trend indicators and collapsible per-horizon breakdown.
Calibration Panel: Confidence band coverage — compares nominal (50%, 90%) vs actual.
Accuracy Timeline: Daily MAE over time for the selected model.
Accuracy Scatter: Model MAE vs naive baseline MAE per day.
Error by Horizon: MAE at each forecast distance (0-7 days).

Tab 3: Economic

Evaluates forecast quality for trading decisions. See Economic Quality Metrics for metric definitions.

Summary Panel

Insight badges across 4 categories: Price Tracking (Corr-f, within-day shape quality), Error Pattern (Cov-e, direction accuracy), Trading Value (spread capture, spike recall), and Best Models (which models lead on tracking vs trading).

Approach Economic Table: Per-approach comparison of economic metrics.
Economic Radar Chart: Multi-axis radar plot comparing economic metric profiles across approaches.
Economic Metrics Table: All models compared on Corr-f (3 variants), Cov-e, direction accuracy, spike recall, and spread capture. Color-coded thresholds.
Economic Timeline: Daily chart of a selected economic metric over time with model selector and metric toggle buttons.
Deviation Scatter: Forecast vs actual price deviations (daily mean removed). Points clustering along the diagonal indicate correct within-day shape prediction.
Spread Capture Chart: Daily BESS arbitrage efficiency as a percentage of the theoretical maximum. Bars color-coded: green (≥70%), yellow (50–70%), red (below 50%).

Tab 4: Experiments

Compares backtest experiment results to identify the best-performing model version.

Summary Panel

Insight badges across 3 categories: Best Experiment (best tag, MAE, delta vs other selected), vs Baseline (comparison against production tag), and Bias Comparison (lowest absolute bias).

Experiment Guide (collapsible): Explains what experiment tags are, links to backtesting methodology, and shows a table of all available tags with descriptions. The currently deployed production tag is marked with ★.
Experiment Summary Table: Side-by-side comparison of selected experiment tags showing MAE, RMSE, bias, Corr-f, and skill scores.
Experiment Comparison Chart: Time-series comparison of daily MAE across selected experiments.

Tab 5: Features

Summary Panel

Insight badges across 3 categories: Top Features (top 3 drivers with importance %), Category Mix (dominant and secondary feature categories), and Concentration (top-5 concentration % with warning if >70%).

Feature Importance

Ranked list of features by model importance, categorized by type (price, time, weather, demand, commodity).

Feature-Wise Error Analysis

Ensemble MAE binned by feature values: renewable share, residual demand, wind generation, price level, and total demand. Each split into Low/Medium/High bins. Identifies conditions where the model struggles.

Shared Controls

Control	Options	Description
Context	Live / Backtest / All	Filter by prediction source
Period	30-90 days	Evaluation window for Accuracy/Features tabs
Approach	Hourly / Expanded / Pure15 / Hybrid15	Resolution strategy filter
Run Mode	Day-ahead / Strategic / All	Product type filter
Experiment Tag	Available backtest tags	Select experiments to compare (Comparison/Experiments tabs)
Production Tag	★ indicator	Marks the currently deployed model version