v7.1 — MLP Neural Network (Rejected)

Date: March 20, 2026 | Status: Rejected

Why This Experiment Exists

The v6.3 validation and v7.0 compression-breaking campaigns established that tree-based models hit a regression slope ceiling at 0.70-0.72 — predictions capture only 70% of actual price variation due to leaf-node averaging. The hypothesis was that neural networks, which don’t have this structural constraint, could break through the ceiling.

Critical question: Is the compression ceiling model-specific (tree architecture) or data-specific (insufficient features)?

What We Tested

sklearn’s MLPRegressor wrapped in a Pipeline with:

SimpleImputer(strategy="median") — handles NaN in commodity features (30% missing)
StandardScaler — normalizes all features to mean=0, std=1 (critical for gradient-based optimization)
MLPRegressor with early stopping, various architectures and learning rates

Experiments

#	Name	Architecture	LR	MAE	Slope	Range	MaxPred	Spike Rec
1	mlp-base	(256,128,64)	0.001	37.24	0.524	1.079	207	33.0%
2	mlp-lr01	(256,128,64)	0.01	38.75	0.337	0.657	129	10.8%
3	mlp-deep	(256,128,64,32)	0.001	38.96	0.263	0.517	147	12.9%

For reference — best XGBoost (pw-3x-d365-t60): MAE 12.69, Slope 0.707, MaxPred ~135

What Failed and Why

MLP is 3x worse than XGBoost on every metric except MaxPred. The model either:

Overfits wildly (mlp-base): Range ratio 1.079 means predictions oscillate MORE than actual prices. MaxPred 207 confirms NNs CAN output extreme values, but the predictions are poorly calibrated — high variance, low accuracy.
Collapses to near-mean (mlp-deep, mlp-lr01): Slope 0.26-0.34 means the model learned to predict ~50-60 EUR for everything. Bias of -34 to -37 confirms near-constant output.

Root causes:

sklearn MLPRegressor uses L-BFGS or Adam without mini-batching. On 360K samples × 100+ features, each iteration processes the entire dataset, making training extremely slow (2-4 hours per experiment) and prone to poor convergence.
No batch normalization or dropout. Without these regularization techniques, the network either memorizes noise (overfitting) or fails to learn meaningful patterns (underfitting).
100+ features with different information densities. Price lags are highly informative while commodity ratios are noisy. The MLP treats all scaled features equally, diluting signal with noise.
Single-threaded CPU training. Each experiment took 2-4 hours vs 30 min for XGBoost, making iterative tuning impractical.

What We Learned

The compression ceiling IS model-specific, not data-specific

The mlp-base experiment proved this: MaxPred 207 (vs XGBoost’s 135) confirms that the neural network CAN output extreme values. The features contain enough signal for 200+ EUR predictions. Trees can’t make these predictions because of leaf averaging; NNs can but sklearn’s optimizer can’t find the right weights.

sklearn MLPRegressor is the wrong tool

The failure is implementation-specific, not architecture-specific:

Capability	sklearn MLP	PyTorch MLP
Mini-batch training	No (full dataset per iter)	Yes
Batch normalization	No	Yes
Dropout regularization	No	Yes
Learning rate scheduling	No	Yes (cosine, warmup, etc.)
GPU acceleration	No	Yes
Custom loss functions	No	Yes
Training speed (360K rows)	2-4 hours	~15-30 min (estimated)

A PyTorch implementation could address every failure mode we observed.

Decision

Rejected — sklearn MLPRegressor is not viable for electricity price forecasting at this dataset scale. The architecture concept (neural networks for decompression) remains valid but requires PyTorch infrastructure.

Next steps:

Option A: Deploy best XGBoost config (pw-3x-d365-t60, MAE 12.69) as production v7.1
Option B: Build PyTorch MLP infrastructure (~200 lines) for proper NN experiments
Option C: Pursue hybrid XGB+NN approach once PyTorch is available