Hyperparameter Tuning

Overview

Model hyperparameters (learning rate, tree depth, regularization strength, etc.) significantly affect forecast accuracy. The EPF system uses Optuna, a Bayesian optimization framework, to efficiently search the hyperparameter space for each model type.

Why Optuna?

Traditional approaches like grid search or random search are inefficient:

Grid search: Evaluates every combination → exponential in number of parameters
Random search: Better coverage but still wastes evaluations on poor regions
Bayesian optimization (Optuna): Learns which regions are promising and focuses search there

Optuna uses a Tree-structured Parzen Estimator (TPE) to build a probabilistic model of the objective function, suggesting parameter combinations that are likely to improve on the best result so far.

Objective Function

The optimization objective is the mean cross-validated MAE across 5-fold TimeSeriesSplit:

objective = mean(MAE_fold_1, MAE_fold_2, ..., MAE_fold_5)

This ensures tuned parameters generalize across different time periods rather than overfitting to a single validation window.

Search Spaces

HistGradientBoosting

Parameter	Range	Scale
`max_iter`	200–1500	Linear
`max_depth`	4–12	Linear
`learning_rate`	0.01–0.2	Log
`min_samples_leaf`	5–50	Linear
`l2_regularization`	0.01–10.0	Log
`max_bins`	128–255	Linear

LightGBM

Parameter	Range	Scale
`n_estimators`	200–1500	Linear
`max_depth`	4–12	Linear
`learning_rate`	0.01–0.2	Log
`min_child_samples`	5–50	Linear
`reg_lambda`	0.01–10.0	Log
`reg_alpha`	0.001–1.0	Log
`num_leaves`	20–200	Linear
`subsample`	0.6–1.0	Linear
`colsample_bytree`	0.6–1.0	Linear

XGBoost

Parameter	Range	Scale
`n_estimators`	200–1500	Linear
`max_depth`	4–12	Linear
`learning_rate`	0.01–0.2	Log
`reg_lambda`	0.01–10.0	Log
`reg_alpha`	0.001–1.0	Log
`subsample`	0.6–1.0	Linear
`colsample_bytree`	0.6–1.0	Linear

Log-Scale Parameters

Parameters like learning_rate and reg_lambda are searched on a logarithmic scale because:

The difference between 0.01 and 0.02 is more impactful than between 0.19 and 0.20
Log scale provides uniform coverage across orders of magnitude
Prevents the search from spending too many trials in the high end of the range

Pruning

Optuna supports early pruning of unpromising trials. If a trial’s first 2 CV folds produce a MAE much worse than the current best, the remaining folds are skipped:

Trial 47: fold 1 MAE = 12.5, fold 2 MAE = 11.8
  → Current best: 3.8 MAE
  → Pruned (no chance of beating best)

This significantly reduces tuning time — typically 30–50% of trials are pruned.

Tuning Workflow

1. Define search space for model type
2. Create Optuna study (minimize objective)
3. For each trial (50–200 trials):
   a. Optuna suggests parameter combination
   b. Train model with 5-fold TimeSeriesSplit
   c. Compute mean CV MAE
   d. Report result to Optuna
   e. (Optuna updates its model of the objective function)
4. Extract best parameters
5. Retrain final model with best parameters on full training data
6. Save model + tuned parameters as artifact

Default vs Tuned Parameters

The EPF system ships with carefully chosen default parameters that work well across typical market conditions:

# Defaults (good starting point)
{"max_depth": 8, "learning_rate": 0.05, "n_estimators": 500}

Optuna tuning typically improves MAE by 3–8% over defaults, with the largest gains coming from:

Learning rate + iterations: Finding the optimal trade-off between slow learning (many iterations) and fast learning (fewer iterations)
Regularization: Matching L2 strength to the noise level in the data
Tree complexity: Adjusting depth and leaf count to the signal-to-noise ratio

When to Retune

Hyperparameters should be retuned when:

Market structure changes significantly (new regulations, plant closures)
The feature set is updated (new features added, old features removed)
Model drift persists after retraining with current parameters
Seasonal performance differences suggest one set of parameters doesn’t fit all conditions

Regular retuning (quarterly or after major system changes) keeps parameters aligned with current data characteristics.