Skip to content

Pipeline Monitoring — Zero-Rows Invariant + Intraday Fail-Loud

Date: April 24-25, 2026 | Status: ✅ Production

Why this version exists

The intraday OMIE collector cron had been running 14 times since 2026-04-17, with every single run logged as status='success' in pipeline_runs — and every single run writing zero rows. The tracker was hiding the failure: the wrapper recorded “success” because no exception was raised, even though the underlying fetch_session() call had been returning None and the collector silently no-op’d.

A consolidated audit (docs/analysis/INTRADAY_RESEARCH.md, merged from three separate audit files) caught the pattern and forced the question: should “no exception, no rows” be classified as success or failure?

The answer is failure by default, with an explicit opt-out for pipelines where zero rows is a legitimate quiet hour.

What changed

PipelineRunTracker zero-rows invariant

PipelineRunTracker.__init__ now takes allow_zero_rows: bool = False. On normal exit (no exception):

  • If rows_written is None or 0 and allow_zero_rows=False → record status='failed' with a synthesized error message
  • If rows_written is None or 0 and allow_zero_rows=True → record status='success' (legitimate quiet hour)
  • If rows_written > 0 → record status='success' regardless

The exception path is unchanged — exceptions still produce status='failed' immediately.

Five callers opted out (with TODOs)

Five pipelines that legitimately can write zero rows in normal operation got allow_zero_rows=True plus a TODO to wire rows_written properly:

  • data_backfill — re-runs against an already-complete window
  • data_update — incremental updates that may find no new data
  • data_hourly — quiet hour with no upstream changes
  • news_pipeline — quarter-hour with no new articles
  • entsoe_update — incremental ENTSO-E pull with no new publications

Three pipelines deliberately did not get the opt-out and now fail loud on zero rows: intraday, predictions, snapshot.

Intraday fail-loud at the source

The opt-out at the tracker level isn’t enough on its own — the upstream call needs to actually raise. OMIEIntradayCollector.run_session_update was changed to raise RuntimeError when fetch_session() returns None, so the failure propagates to the tracker rather than getting swallowed.

The pre-fix behavior (cron logs “success”, zero rows) is gone. Post-fix, the cron logs failed and the tracker captures the underlying error message.

ntfy alerts on intraday failures

A push-notification rule was added on top of pipeline_runs.status='failed' for intraday_* runs. Currently this fires up to 6 times a day until the intraday rewrite ships — mitigated by the alert filter from the performance sprint.

Key files