Why Cyclical Encoding
The Decision
All periodic time features in the EPF pipeline (hour of day, day of week, month of year) are encoded using sin/cos transformations rather than ordinal integers or one-hot vectors.
hour_sin = sin(2π × hour / 24)hour_cos = cos(2π × hour / 24)The Problem with Ordinal Encoding
If hours are encoded as integers (0, 1, 2, …, 23), the model sees hour 23 and hour 0 as maximally distant — 23 units apart. In reality, they’re adjacent: 23:00 is one hour before 00:00. The same problem affects:
- Day of week: Sunday (6) appears far from Monday (0)
- Month: December (12) appears far from January (1)
This creates artificial discontinuities at period boundaries that tree-based models must learn to split around, wasting model capacity on an encoding artifact.
The Problem with One-Hot Encoding
One-hot encoding (24 binary columns for hours) avoids the distance problem but creates its own issues:
- High dimensionality: 24 + 7 + 12 = 43 binary features just for time
- No proximity signal: Hour 13 has no relationship to hour 14 in one-hot space
- Sparse splits: Each tree split can only ask “is it hour X?” rather than “is it morning?”
Sin/Cos: The Best of Both
A sin/cos pair maps periodic features onto a unit circle, preserving both continuity and proximity:
| Hour | sin | cos | Position on circle |
|---|---|---|---|
| 0 (midnight) | 0.00 | 1.00 | Top |
| 6 (morning) | 1.00 | 0.00 | Right |
| 12 (noon) | 0.00 | -1.00 | Bottom |
| 18 (evening) | -1.00 | 0.00 | Left |
| 23 | -0.26 | 0.97 | Near top (close to hour 0) |
Key properties:
- Continuity: Adjacent hours have similar sin/cos values
- Cyclical closure: Hour 23 is close to hour 0 on the circle
- Compact: Only 2 features per cycle (vs 24 for one-hot)
- Proximity preserved: Morning hours cluster together, as do evening hours
All Cyclical Features in EPF
| Feature | Period | Encoding |
|---|---|---|
| Hour of day | 24 | hour_sin, hour_cos |
| Day of week | 7 | dow_sin, dow_cos |
| Month of year | 12 | month_sin, month_cos |
| Week of year | 52 | week_sin, week_cos |
| Quarter of day (15-min) | 96 | quarter_sin, quarter_cos |
The 15-minute models add a 96-period quarter-of-day encoding:
quarter_of_day = hour × 4 + minute ÷ 15 # 0–95quarter_sin = sin(2π × quarter / 96)quarter_cos = cos(2π × quarter / 96)Direct Model Encoding
In the direct prediction framework, both the origin and target time are encoded:
# Origin time (when the forecast is made)origin_hour_sin, origin_hour_cosorigin_dow_sin, origin_dow_cosorigin_month_sin, origin_month_cos
# Target time (what's being predicted)target_hour_sin, target_hour_costarget_dow_sin, target_dow_cosThis allows the model to learn interactions between origin and target timing — for example, “a forecast made on Friday afternoon for Monday morning” has different characteristics than “a forecast made on Tuesday morning for Wednesday evening.”
Why Not Both?
Some implementations use cyclical encoding alongside integer features, letting the model choose. The EPF system uses only sin/cos to keep the feature set compact and avoid redundancy. Tree-based models can reconstruct any hour-specific pattern from sin/cos pairs via multiple splits, so no expressiveness is lost.
Impact on Model Performance
The encoding choice primarily affects training efficiency rather than ultimate accuracy. Models using ordinal encoding eventually learn the boundary discontinuities, but they require more training data and deeper trees to do so. Cyclical encoding provides the correct inductive bias from the start, leading to:
- Faster convergence during training
- Fewer splits wasted on boundary effects
- Slightly better generalization on short training windows