Skip to content

Why Cyclical Encoding

The Decision

All periodic time features in the EPF pipeline (hour of day, day of week, month of year) are encoded using sin/cos transformations rather than ordinal integers or one-hot vectors.

hour_sin = sin( × hour / 24)
hour_cos = cos( × hour / 24)

The Problem with Ordinal Encoding

If hours are encoded as integers (0, 1, 2, …, 23), the model sees hour 23 and hour 0 as maximally distant — 23 units apart. In reality, they’re adjacent: 23:00 is one hour before 00:00. The same problem affects:

  • Day of week: Sunday (6) appears far from Monday (0)
  • Month: December (12) appears far from January (1)

This creates artificial discontinuities at period boundaries that tree-based models must learn to split around, wasting model capacity on an encoding artifact.

The Problem with One-Hot Encoding

One-hot encoding (24 binary columns for hours) avoids the distance problem but creates its own issues:

  • High dimensionality: 24 + 7 + 12 = 43 binary features just for time
  • No proximity signal: Hour 13 has no relationship to hour 14 in one-hot space
  • Sparse splits: Each tree split can only ask “is it hour X?” rather than “is it morning?”

Sin/Cos: The Best of Both

A sin/cos pair maps periodic features onto a unit circle, preserving both continuity and proximity:

HoursincosPosition on circle
0 (midnight)0.001.00Top
6 (morning)1.000.00Right
12 (noon)0.00-1.00Bottom
18 (evening)-1.000.00Left
23-0.260.97Near top (close to hour 0)

Key properties:

  • Continuity: Adjacent hours have similar sin/cos values
  • Cyclical closure: Hour 23 is close to hour 0 on the circle
  • Compact: Only 2 features per cycle (vs 24 for one-hot)
  • Proximity preserved: Morning hours cluster together, as do evening hours

All Cyclical Features in EPF

FeaturePeriodEncoding
Hour of day24hour_sin, hour_cos
Day of week7dow_sin, dow_cos
Month of year12month_sin, month_cos
Week of year52week_sin, week_cos
Quarter of day (15-min)96quarter_sin, quarter_cos

The 15-minute models add a 96-period quarter-of-day encoding:

quarter_of_day = hour × 4 + minute ÷ 15 # 0–95
quarter_sin = sin( × quarter / 96)
quarter_cos = cos( × quarter / 96)

Direct Model Encoding

In the direct prediction framework, both the origin and target time are encoded:

# Origin time (when the forecast is made)
origin_hour_sin, origin_hour_cos
origin_dow_sin, origin_dow_cos
origin_month_sin, origin_month_cos
# Target time (what's being predicted)
target_hour_sin, target_hour_cos
target_dow_sin, target_dow_cos

This allows the model to learn interactions between origin and target timing — for example, “a forecast made on Friday afternoon for Monday morning” has different characteristics than “a forecast made on Tuesday morning for Wednesday evening.”

Why Not Both?

Some implementations use cyclical encoding alongside integer features, letting the model choose. The EPF system uses only sin/cos to keep the feature set compact and avoid redundancy. Tree-based models can reconstruct any hour-specific pattern from sin/cos pairs via multiple splits, so no expressiveness is lost.

Impact on Model Performance

The encoding choice primarily affects training efficiency rather than ultimate accuracy. Models using ordinal encoding eventually learn the boundary discontinuities, but they require more training data and deeper trees to do so. Cyclical encoding provides the correct inductive bias from the start, leading to:

  • Faster convergence during training
  • Fewer splits wasted on boundary effects
  • Slightly better generalization on short training windows