LSTM Price Encoder
Overview
The LSTM Price Encoder is a pre-trained neural network that converts recent price history into a fixed-size vector of temporal features. These embeddings are appended to XGBoost’s tabular features, giving the model temporal context that flattened lag columns cannot represent.
Why this matters: Tree-based models on lag features can see values at past times, but cannot recover shape — trend direction, volatility regime, whether prices are rising or falling over the past week. The LSTM encoder captures this structure and makes it available to XGBoost as 64 additional features.
This was the key architectural change in v10.0 that broke the structural MAE ceiling held since v4.3.
Architecture
Price sequence (last 168 hours = 7 days) ↓ Normalize (z-score using training μ/σ) ↓ LSTM encoder: 2 layers × 64 hidden units ↓ Last hidden state → 64-dim embedding ↓XGBoost input: 90 tabular + 64 LSTM = 154 features ↓ Quantile price forecast (q=0.55)PriceEncoder (PyTorch)
class PriceEncoder(nn.Module): def __init__(self, hidden_dim=64, n_layers=2, dropout=0.1, input_size=1, output_size=1): ... self.lstm = nn.LSTM(input_size, hidden_dim, n_layers, ...) self.head = nn.Linear(hidden_dim, output_size)
def encode(self, x): # x: (batch, seq_len, input_size) _, (h_n, _) = self.lstm(x) return h_n[-1] # last layer hidden state: (batch, 64)The encoder outputs the last hidden state of the top LSTM layer — a 64-dimensional vector that summarises the input sequence’s temporal dynamics.
v10.0 vs v10.1: Task-Aligned Training
Two encoder variants were explored:
| v10.0 | v10.1 | |
|---|---|---|
| Training objective | Next-hour prediction (generic) | Next-24h joint prediction (task-aligned) |
output_size | 1 | 24 |
input_size | 1 (price only) | 3 (price + demand + temperature) |
| Performance | −8.5% DA MAE vs v4.3 | −12.3% DA MAE vs v4.3 |
Task-aligned training (v10.1): The LSTM head is trained to predict the next 24 hours jointly, not just the next hour. This forces the encoder to produce embeddings specifically useful for day-ahead price forecasting rather than generic sequence compression. In ablation tests, task-aligned encoders consistently outperformed generic ones.
Exogenous inputs (v10.1): Adding demand and temperature to the input sequence (input_size=3) provides the LSTM with cross-series context during pre-training, improving embeddings for weather-sensitive hours.
Residual-from-Baseline Target
The LSTM architecture works in combination with a residual targeting strategy: instead of predicting raw EUR/MWh, XGBoost predicts the deviation from the weekly median price. The final forecast is:
forecast = weekly_median_baseline + XGBoost_residual_predictionThis isolates the temporal signal the LSTM needs to encode: regime changes, sustained trend shifts, and volatility bursts. The weekly median baseline absorbs the mean-reverting component, leaving a cleaner signal for both the LSTM and XGBoost.
Why 1-week baseline over 4-week? Tested in validation (v10.0 experiments). A 4-week baseline introduces “regime memory” — when prices shift to a new level, the baseline lags for weeks, creating systematic residual bias. The 1-week window adapts quickly enough to avoid this without adding noise.
Training Procedure
The encoder is pre-trained separately before gradient boosting training:
- Pre-train the LSTM on the full historical price series with the task-aligned objective (24h joint prediction)
- Freeze encoder weights — the LSTM does not train during XGBoost fitting
- Generate embeddings for all training samples (batch GPU inference)
- Append to features — 64
lstm_emb_0…lstm_emb_63columns added to the feature matrix - Train XGBoost on the combined 154-feature input
Pre-training and inference are handled by src/models/lstm_embedder.py.
Inference
At prediction time, LSTMEmbedder loads the saved checkpoint and computes the embedding for the current origin:
from src.models.lstm_embedder import LSTMEmbedder
embedder = LSTMEmbedder(model_path="data/models/lstm_encoder.pt")embedding = embedder.compute_embedding(price_series, origin_idx)# Returns: {"lstm_emb_0": float, ..., "lstm_emb_63": float}The embedder handles normalization, padding for short histories (< 168h), and GPU/CPU routing automatically.
Key Validation Findings (18 experiments, v10.0–v10.1)
| Finding | Result |
|---|---|
| Generic encoder (next-1h target) | +8.5% vs v4.3 |
| Task-aligned encoder (next-24h target) | +12.3% vs v4.3 |
| LSTM + price weighting | Worse — embedding signal destabilised |
| LSTM + 4-week baseline | Worse — regime memory bias |
| LSTM + 1-week baseline | Best configuration |
| Adding demand + temperature to encoder input | +additional gain in v10.1 |
Confirmed incompatibilities: Price weighting (upweighting recent or high-price samples) was tested three times and consistently degraded performance when combined with LSTM. The embedding signal and the reweighting gradient interfere.
Production Configuration (v10.1)
| Parameter | Value |
|---|---|
hidden_dim | 64 |
n_layers | 2 |
dropout | 0.1 |
input_size | 3 (price, demand, temperature) |
output_size | 24 (task-aligned) |
window | 168 hours (7 days) |
| Normalization | Z-score (μ/σ from training data) |
| Device | CUDA if available, else CPU |
| Batch size (inference) | 2048 |
Effect on Forecast Range
The structural ceiling in tree-based models comes from leaf averaging: with enough data diversity, leaves converge to near-mean values, compressing the prediction range. The LSTM encoder breaks this by providing regime signals that create distinct leaf populations for high-spike vs normal conditions.
| v4.3 (no LSTM) | v10.1 (LSTM) | |
|---|---|---|
| MaxPred | ~127 EUR/MWh | 209 EUR/MWh |
| Bias | ~−12 EUR/MWh | −0.65 EUR/MWh |
| Spike Recall | ~16% | 24.1% |
The LSTM encoder was stress-tested in the March 2026 Iran crisis (prices 170–247 EUR/MWh). v10.1 crisis MAE was 27.16 EUR/MWh — meaningfully better than the structural limit v4.3 would have hit during the same period.
See src/models/lstm_embedder.py for the full implementation.