Pipeline Monitoring — Zero-Rows Invariant + Intraday Fail-Loud
Date: April 24-25, 2026 | Status: ✅ Production
Why this version exists
The intraday OMIE collector cron had been running 14 times since 2026-04-17, with every single run logged as status='success' in pipeline_runs — and every single run writing zero rows. The tracker was hiding the failure: the wrapper recorded “success” because no exception was raised, even though the underlying fetch_session() call had been returning None and the collector silently no-op’d.
A consolidated audit (docs/analysis/INTRADAY_RESEARCH.md, merged from three separate audit files) caught the pattern and forced the question: should “no exception, no rows” be classified as success or failure?
The answer is failure by default, with an explicit opt-out for pipelines where zero rows is a legitimate quiet hour.
What changed
PipelineRunTracker zero-rows invariant
PipelineRunTracker.__init__ now takes allow_zero_rows: bool = False. On normal exit (no exception):
- If
rows_writtenisNoneor0andallow_zero_rows=False→ recordstatus='failed'with a synthesized error message - If
rows_writtenisNoneor0andallow_zero_rows=True→ recordstatus='success'(legitimate quiet hour) - If
rows_written > 0→ recordstatus='success'regardless
The exception path is unchanged — exceptions still produce status='failed' immediately.
Five callers opted out (with TODOs)
Five pipelines that legitimately can write zero rows in normal operation got allow_zero_rows=True plus a TODO to wire rows_written properly:
data_backfill— re-runs against an already-complete windowdata_update— incremental updates that may find no new datadata_hourly— quiet hour with no upstream changesnews_pipeline— quarter-hour with no new articlesentsoe_update— incremental ENTSO-E pull with no new publications
Three pipelines deliberately did not get the opt-out and now fail loud on zero rows: intraday, predictions, snapshot.
Intraday fail-loud at the source
The opt-out at the tracker level isn’t enough on its own — the upstream call needs to actually raise. OMIEIntradayCollector.run_session_update was changed to raise RuntimeError when fetch_session() returns None, so the failure propagates to the tracker rather than getting swallowed.
The pre-fix behavior (cron logs “success”, zero rows) is gone. Post-fix, the cron logs failed and the tracker captures the underlying error message.
ntfy alerts on intraday failures
A push-notification rule was added on top of pipeline_runs.status='failed' for intraday_* runs. Currently this fires up to 6 times a day until the intraday rewrite ships — mitigated by the alert filter from the performance sprint.
Key files
src/data/pipeline_tracker.py— thePipelineRunTrackercontext managersrc/data/omie_intraday.py—OMIEIntradayCollector.run_session_updatefail-louddocs/analysis/INTRADAY_RESEARCH.md— consolidated audit
Related
- System Page Redesign — surfaces failed runs in the country health cards
- Evaluation Data Integrity Fix — companion invariant work on the eval side