v7.1 — MLP Neural Network (Rejected)
Date: March 20, 2026 | Status: Rejected
Why This Experiment Exists
The v6.3 validation and v7.0 compression-breaking campaigns established that tree-based models hit a regression slope ceiling at 0.70-0.72 — predictions capture only 70% of actual price variation due to leaf-node averaging. The hypothesis was that neural networks, which don’t have this structural constraint, could break through the ceiling.
Critical question: Is the compression ceiling model-specific (tree architecture) or data-specific (insufficient features)?
What We Tested
sklearn’s MLPRegressor wrapped in a Pipeline with:
SimpleImputer(strategy="median")— handles NaN in commodity features (30% missing)StandardScaler— normalizes all features to mean=0, std=1 (critical for gradient-based optimization)MLPRegressorwith early stopping, various architectures and learning rates
Experiments
| # | Name | Architecture | LR | MAE | Slope | Range | MaxPred | Spike Rec |
|---|---|---|---|---|---|---|---|---|
| 1 | mlp-base | (256,128,64) | 0.001 | 37.24 | 0.524 | 1.079 | 207 | 33.0% |
| 2 | mlp-lr01 | (256,128,64) | 0.01 | 38.75 | 0.337 | 0.657 | 129 | 10.8% |
| 3 | mlp-deep | (256,128,64,32) | 0.001 | 38.96 | 0.263 | 0.517 | 147 | 12.9% |
For reference — best XGBoost (pw-3x-d365-t60): MAE 12.69, Slope 0.707, MaxPred ~135
What Failed and Why
MLP is 3x worse than XGBoost on every metric except MaxPred. The model either:
-
Overfits wildly (mlp-base): Range ratio 1.079 means predictions oscillate MORE than actual prices. MaxPred 207 confirms NNs CAN output extreme values, but the predictions are poorly calibrated — high variance, low accuracy.
-
Collapses to near-mean (mlp-deep, mlp-lr01): Slope 0.26-0.34 means the model learned to predict ~50-60 EUR for everything. Bias of -34 to -37 confirms near-constant output.
Root causes:
- sklearn MLPRegressor uses L-BFGS or Adam without mini-batching. On 360K samples × 100+ features, each iteration processes the entire dataset, making training extremely slow (2-4 hours per experiment) and prone to poor convergence.
- No batch normalization or dropout. Without these regularization techniques, the network either memorizes noise (overfitting) or fails to learn meaningful patterns (underfitting).
- 100+ features with different information densities. Price lags are highly informative while commodity ratios are noisy. The MLP treats all scaled features equally, diluting signal with noise.
- Single-threaded CPU training. Each experiment took 2-4 hours vs 30 min for XGBoost, making iterative tuning impractical.
What We Learned
The compression ceiling IS model-specific, not data-specific
The mlp-base experiment proved this: MaxPred 207 (vs XGBoost’s 135) confirms that the neural network CAN output extreme values. The features contain enough signal for 200+ EUR predictions. Trees can’t make these predictions because of leaf averaging; NNs can but sklearn’s optimizer can’t find the right weights.
sklearn MLPRegressor is the wrong tool
The failure is implementation-specific, not architecture-specific:
| Capability | sklearn MLP | PyTorch MLP |
|---|---|---|
| Mini-batch training | No (full dataset per iter) | Yes |
| Batch normalization | No | Yes |
| Dropout regularization | No | Yes |
| Learning rate scheduling | No | Yes (cosine, warmup, etc.) |
| GPU acceleration | No | Yes |
| Custom loss functions | No | Yes |
| Training speed (360K rows) | 2-4 hours | ~15-30 min (estimated) |
A PyTorch implementation could address every failure mode we observed.
Decision
Rejected — sklearn MLPRegressor is not viable for electricity price forecasting at this dataset scale. The architecture concept (neural networks for decompression) remains valid but requires PyTorch infrastructure.
Next steps:
- Option A: Deploy best XGBoost config (pw-3x-d365-t60, MAE 12.69) as production v7.1
- Option B: Build PyTorch MLP infrastructure (~200 lines) for proper NN experiments
- Option C: Pursue hybrid XGB+NN approach once PyTorch is available