Morning Ops: how other people actually build it
Research synthesis, 2026-07-01. Framed as: our setup is a flawed baseline, not ahead. Where the field is simpler, cheaper, or more reliable than us, this says so.
28 agents run
51 unique sources found
20 deep-read + maturity-checked
16 shipped real, 4 demo/wip
Read this first: what to distrust in this report
A second pass flagged real holes. Do not treat the below as gospel:
- Section 4 ("where we are worse") was NOT verified against our actual code. The workflow never read our ops skill. Claims that we are "stateless per run," "unbounded fan-out," "reliability in prompts not hooks" are informed guesses. A short read of our own files would confirm or kill each one.
- The product sample is skewed. Only Cora and Serif (both small) got read. Superhuman AI, Shortwave, Fyxer, Notion AI, Motion, Reclaim, Zapier-style briefs never appeared, so any "how common" count and "where SaaS wins" line is drawn from a thin slice.
- No self-hosted / local-LLM build was found. All 20 sources run on cloud Claude or a hosted service. For a privacy-conscious operator, that whole branch is missing.
- The load-bearing numbers are single-sourced and self-reported. "$3/day," "~80% cost cut," "~30 threads per subagent," "coordination breaks past 8 subagents" each trace to one blog, no methodology, no counter-example.
- Zero evidence on triage accuracy. Every source describes architecture; none measures how often the "needs a human reply" call is wrong, which is the highest-stakes failure (a missed must-respond).
1. The component set that keeps showing up
Building blocks recurring across the serious builds. "How ours compares" is the honest read.
- Multiple read-only ingestion sources (email + calendar + tasks + revenue). Near-universal. mattpaige68 pulls 8 connectors. Us: narrower, email-first, no calendar or money-flow source wired in.
- A triage classifier with a small fixed set of lanes (3 to 4, almost always with an explicit "needs a human reply" and an "ignore / FYI" bucket). EAIA, claude-chief-of-staff. Us: strong "needs response" lane, but no cheap ignore/FYI pre-filter.
- Prioritization with a hard cap (Top-3 or P0/P1/P2). Aeon drops any item without a 12-word "why now." Us: money-first Top-3 already, good.
- Auto-draft grounded in your Sent voice, draft-only. "Never send" stated as a hard rule (Cora, Serif). Us: Sent-grounded, matches the best.
- A delivery surface with a push nudge (rich artifact + a short companion ping so the phone actually buzzes). Us: phone email + terminal.
- Scheduling via cron/launchd or a managed Routine. doneyli runs 7 launchd jobs 24/7 with watchdogs. Us: cron/launchd.
- Persistent memory / state that carries across runs. doneyli: observe, nightly reflect, retrieve with decay. Us: living-docs exist but are not wired into the daily loop as a read-then-update ledger.
- Freshness / dedup / idempotency state (a per-message-id log so re-runs skip handled items). ArtJack (SQLite), htlin (actions.jsonl). Us: fail-loud freshness, but no per-thread idempotency.
- Policy / config as editable files (VIPs, rules, goals, schedules), not buried in the prompt. Blattman, mimurchison. Us: rules live in prose, not checked-in config.
- Graduated autonomy ladder (draft, then supervised, then auto for safe classes) with a hardcoded never-send list. doneyli. Us: draft-only, no ladder.
2. The four architecture patterns
A. Single scheduled prompt (one cron, one agent pass)
mattpaige68, MindStudio, marcos-narvaez. Simplest, cheapest, easy to make idempotent. But no coverage between runs, and depth is capped at one context window.
B. Agent swarm / orchestrator fan-out
the-ai-corner (6 subagents), aimaker, Channel. This is our shape. Deepest reading, most parallel throughput. But expensive, coordination overhead grows past ~8 subagents, and rules must be repeated per subagent or they drop.
C. Event-driven / continuous
EAIA (every 10 min), doneyli (30-min zero-LLM scan), claude-ops (2-min pre-warm), kklaw (watcher). Near-real-time urgents, instant briefs from warm cache. But more moving parts and more failure surfaces.
D. Product SaaS (managed inbox)
Cora, Serif. Zero maintenance, polished, graduated autonomy built in. But no money-first framing, no grounding in your own systems, data leaves the box, monthly cost.
The strongest shipped builds are hybrids: a cheap event-driven tick for urgents plus a scheduled heavy pass for depth (doneyli, claude-ops). We are pure B with no cheap tick.
3. Patterns worth stealing (things we do NOT do), ranked
- A cheap deterministic between-sweeps urgent scan. Zero-LLM or Haiku rules every 20-30 min against a VIP/keyword list catches "on fire" items in seconds at near-zero cost. doneyli reports ~80% cost cut and 3-5 urgents/day caught this way.
- Per-thread idempotency + dedup state (log-as-state). Append-only record keyed on thread id so a re-run never re-proposes a handled thread, and a newer reply supersedes a stale decision. htlin, EAIA, ArtJack. Directly fixes "the sweep re-surfaces the same thread every morning."
- A corrections to few-shot feedback loop. Every time you edit or override a proposed decision, capture it as an example that tunes the next run. EAIA. We capture nothing, so the classifier never learns from being wrong.
- Hourly catch-up with an exactly-once daily marker. A closed laptop at the scheduled minute still delivers exactly once when it wakes. ArtJack. We currently just miss the run.
- An explicit "uncertain" third verdict. Forcing a confident decision on ambiguous mail produces false confidence. htlin (actionable / not / uncertain), EAIA ("if unsure, notify").
- Dual-model cost split. Cheap model decides which threads deserve the expensive model. doneyli runs two tenants under $3/day; ArtJack has per-route model knobs.
- Enforce post-action steps with hooks, not prompts. Move "log the decision, mark must-respond, commit state" into a hook so the model cannot skip it. affaan-m, the-nuanced-perspective.
- A "Handled" / "Changed since yesterday" delta in the brief. Proves the work and lets you scan only what moved. doneyli, mattpaige68.
- Learn-once escalation to shrink the queue. On a novel thread, ask once, capture the answer as a rule, auto-handle that class after. Serif. Turns a static queue into a shrinking one.
- Config-as-files. VIP + never-send + CC-preservation rules, goals, schedules, all checked in and editable. doneyli, mimurchison, Blattman. Maps directly to encoding the THM/family reply-all rule in code, not prose.
- Idempotent queue writes. WHERE-NOT-EXISTS guards so a re-run never duplicates a queue item. anothercodingblog.
- Cost-gated wakeup. Cheap fetch in cron; only spin the expensive swarm if there is new material. kklaw.
4. Where our system is likely worse or over-built
Inferential (see the caveat banner), but the pattern is consistent enough to act on after a quick self-check.
- Over-built the read path, under-built the state path. Simpler systems treat dedup/idempotency as first-class (a SQLite or JSONL keyed on message id). A stateless deep sweep is the expensive way to get a worse result than a stateful shallow one.
- No cheap tier means we are both slow to urgents and expensive. Running a full parallel-subagent read as the only tier is our single most over-built choice; a rules pass gives most of the coverage nearly free.
- Fan-out is probably unbounded per thread. Past ~8 subagents you spend more on coordination than work, and lossy hand-off drops load-bearing rules. Batching ~30 threads per subagent is cheaper and more reliable.
- Our freshness may be a rendering check, not a state machine. Better builds attach a failure taxonomy (api_error / stale_data / rate_limited) and auto-retry; "fail-loud" as a badge is weaker than fail-loud-by-omission plus catch-up.
- No missed-run recovery. Asleep at the cron minute means no brief. Hourly catch-up + per-day marker is strictly better.
- No feedback loop, so misfires are permanent. We rely on a human hand-editing memory, which is the tell that it should be a file the system updates.
- Reliability likely lives in prompts, not hooks. Steps buried in a prompt are silently skippable.
- Delivery is one-way and un-archived. No diffable "what did the brief say last Tuesday," no codified degrade-to-plain-text path.
- We may lack the whole non-email half. Nearly every serious build wires calendar; several add drive-time and a money/payments block. Email-only is narrow for a hotel operator.
5. Traps others hit that we should avoid
- Too many triage buckets lowers accuracy. Blattman: "Fewer buckets, higher accuracy. You can split later."
- Wrong hook granularity for logging. Use a Stop hook, not PostToolUse, to log a finished brief (PostToolUse fires per tool call and logs fragments).
- Cloud Routines gotchas if we ever move off local: MCP-only, only github.com over Bash, deferred connectors need a Phase-0 tool-load gate or the run hangs, 15-runs/day cap. anothercodingblog.
- Brief bloat / news-dump. Aeon: "a priming document, not a news dump. Every line must answer so-what."
- Alert spam on green. Keep clean runs silent; file only on failure. Our infra alerts should go quiet on green.
- Retrying a rich delivery surface instead of degrading to plain text. mattpaige68.
- Re-fire loops on trigger files. Delete the trigger before executing so a crash consumes the job once. kklaw.
- Fabricating to fill a brief. Hard-code "only real fetched data," drop empty sections. marcos-narvaez.
- Auto-sending too early. Consensus is draft-only, with a numeric graduation ladder and a hardcoded never-send list, not vibes.
6. If we rebuilt the morning process from scratch
- Two-tier ingestion: cheap deterministic/Haiku scan every 20-30 min for urgents + the heavy scheduled deep pass for depth. Keep Sent-grounded reading in the heavy tier only.
- Stateful dedup layer: one SQLite or JSONL keyed on thread id (last-seen, exactly-once, newer-reply-supersedes). Highest-leverage missing piece.
- Batched, bounded fan-out: ~30 threads per subagent, under 8 subagents, load-bearing rules repeated in each subagent prompt.
- Few lanes incl. an explicit uncertain bucket + an ignore pre-filter that auto-archives noreply/receipts and reports "skipped N."
- Money-first Top-3 with a why-now gate, a "Changed since yesterday" delta, and a "Handled" section, componentized, empty sections dropped.
- Config-as-files: goals, VIP + never-send + CC-preservation, schedules, all checked in.
- Reliability via hooks: a Stop hook that logs the finished brief and commits state; idempotent queue writes.
- Freshness as a state machine: per-source stale/api-error/rate-limit flags, self-heal before briefing, fail-loud-by-omission.
- Missed-run recovery: scheduled time + hourly catch-up + per-day exactly-once marker.
- Delivery: phone push + one-line companion nudge, no-retry degrade to plain text, immutable daily archive for QA.
- Dual-model cost control: Haiku classifies and picks what earns the deep read; big model only for drafting/synthesis.
- Corrections to few-shot loop + learn-once escalation so the queue shrinks and triage improves.
Keep from today: money-first framing, Sent-folder grounding, the "a named human is blocked on you" must-respond concept (no source had an exact analog, it is our one genuine differentiator), and draft-only.
Sources (20 deep-read of 51 found)
| Source | Type | Maturity |
| doneyli, AI Chief of Staff (7 launchd jobs, two-tier) | blog | shipped |
| langchain-ai / executive-ai-assistant (EAIA) | github | shipped |
| claude-ops (Business OS for Claude Code) | github | shipped |
| Cora by Every (brief twice a day) | blog | shipped |
| htlin, Claude Code mail triage (Haiku + tmux) | github | shipped |
| Claude Blattman, AI Executive Assistant | blog | shipped |
| mattpaige68, Claude Code as Chief of Staff | blog | shipped |
| aaronjmars / aeon (self-healing morning loop) | github | shipped |
| mimurchison / claude-chief-of-staff (config-as-files) | github | shipped |
| anothercodingblog, Daily Brief on Routines (6 lessons) | blog | shipped |
| Channel, subagents + orchestrator pattern | blog | shipped |
| marcos-narvaez / daily-briefing-agent (one prompt) | github | shipped |
| ArtJack / email-agent (SQLite dedup, catch-up) | github | shipped |
| aimaker, Claude cowork scheduled tasks | blog | shipped |
| the-nuanced-perspective, Chief of Staff in 45 min | blog | shipped |
| Serif (learns from Sent, graduated autonomy) | product | shipped |
| kkovacs / kklaw (inject-dir watcher) | github | shipped |
| the-ai-corner, Claude Code Chief of Staff | blog | demo/wip |
| affaan-m, chief-of-staff subagent (hooks over prompts) | github | demo/wip |
| MindStudio, EA with Claude Code + Workspace | blog | demo/wip |