In Progress Created 2026-06-17 0/33 tasks

Telegram Digest Loop-Closer + Tier 1 Gate Relax

Priority: P2 Category: PREP

Executive Summary

The AI intelligence agent has a working preview stage (Mon/Thu digests) but the implement and eval loops are dormant — tier1_apply.py has run 0 times in 4 days, feedback.jsonl has 1 entry total. This plan wires the Telegram bot to be the loop-closer: reaction buttons on each digest finding write verdicts to feedback.jsonl (closes eval), and the Tier 1 source gate is widened so Cole-curated X threads can also auto-apply (closes implement). Together these convert the intel agent from "fancy reading material" into a learning system that produces measurable doc/skill changes per week.

Research Phase

Current State

Pipeline (~/ai-projects-local/mission-control/scripts/ai_intelligence/):

scrape_*.py runs 2×/day via LaunchAgent → drops candidates in data/ai-intelligence/incoming/
score.py → canonicalizes, scores (Haiku), merges to finds.jsonl (currently 110 finds)
tier1_apply.py runs after each scrape, hard-gated on is_official(source) AND score >= 80
digest.py runs Mon/Thu → assembles digest, includes any X threads from browser-agent/threads/pending/, sends via telegram.py
feedback.py accepts verdicts and can dispatch a find to MC daemon queue

Telegram bot (~/ai-projects-local/mission-control/scripts/telegram_watcher.py):

Long-polling worker, persistent state in data/telegram_watcher_state.json
Permission tiers via daemon/users.py (admin/operator)
Already saves X-thread URLs to browser-agent/threads/pending/ via Playwright
Has zero callback_query / inline keyboard support — entirely text-message based today

Loop status:

✅ Preview firing (last digest 2026-06-15, scrape this morning 06:44)
❌ Tier 1 auto-apply: last run logged tier1: no eligible finds; tier1_log.jsonl does not exist
❌ Eval: feedback.jsonl has 1 row (2026-06-12) across 4 days of operation
⚠️ Tier 3 dispatch: works in code, but MC approval queue is 49 deep and daemon idle

Historical Context

Old claude_intelligence.sh (daily 8am) did Tier 1 auto-apply directly; the new flow split this out into tier1_apply.py with a stricter gate.
Telegram bot was originally built to chat with Claude; X-thread queue was bolted on May 2026 (per feedback_telegram_xthread_routing.md).
Cole's stated frustration (this conversation): "are we actually implementing these things, or are we just reading through the thread and calling it good?"

Constraints

Bot is single-process long-poll. Adding callback handlers must not break message flow (~30s typing-indicator loop must keep working).
Permission tier. Only admin (Cole) should be able to dispatch / approve via Telegram. Operator (Caleb) can react but verdicts logged separately.
No new MCP servers — urllib HTTP calls to Telegram API match existing style.
feedback.jsonl schema is fixed by feedback.py (ts, find_id, verdict, title, score, lens). Keep compatibility.
Tier 1 gate change is reversible — must include kill switch / rollback to old gate.
Telegram inline keyboard message length cap 4096 chars; per-button text 64 chars.

Options Considered

Option 1: Polling-only digest cards (chosen for #1)

Approach: digest.py sends one Telegram message per top-K find (K=5-10) with inline keyboard buttons (Done / Dispatch / Save / Dismiss). telegram_watcher.py adds a callback_query branch in its long-poll loop that writes verdicts to feedback.jsonl and answers the callback with a confirmation toast.

Pros:

Reuses existing bot process (no new daemon)
Inline keyboards are first-class Telegram primitive — no webhook required
Each finding becomes a discrete decision the user can knock out from phone
Symmetric with how MC dashboard already shows finds

Cons:

Adds callback_query handling to a code path that's only handled text messages
5-10 messages per digest is moderate notification noise (mitigation: send as one digest message with a "Review →" button that lands on the card sequence)

Estimated Effort: Medium (3-4 hours)

Option 2: Single threaded digest message + reply-with-command verdicts

Approach: Digest sends one summary message. Cole replies done F-0617-3, dispatch F-0617-5, etc. Bot parses replies as text.

Pros: No callback_query plumbing; pure text. Fastest to build. Cons: Friction. Typing find IDs on phone is exactly the kind of micro-toil that kills loop closure. The whole point is one-tap verdicts. Defeats the goal.

Estimated Effort: Low (1-2 hours)

Option 3: Web dashboard mobile review only

Approach: Don't touch Telegram. Add a /digest/review page to MC dashboard, send a Telegram link to it on Mon/Thu.

Pros: Richer UI, more buttons possible. Cons: Opens browser, requires SSO/VPN to dashboard from phone (Cole has noted login friction). Defeats the "while watching TV" goal. Same problem as why MC queue sits at 49.

Estimated Effort: Medium (4-5 hours, more if auth needed)

Tier 1 gate options (#2)

Option A: Add `cole_curated` source class (chosen)

Approach: Anything in browser-agent/threads/pending/ (i.e. Cole manually queued via Telegram) gets tagged source_class: "cole_curated" in score.py. tier1_apply.py gate becomes (is_official(source) OR source_class == "cole_curated") AND score >= 75.

Pros:

Adds a provenance tier that reflects reality (Cole already vetted these by queueing)
Lower threshold (75 vs 80) reflects that human curation reduces false positives
Easy to revert (one boolean flip in gate)

Cons:

Expands what auto-edits CLAUDE.md/howto.md — slightly larger blast radius
Requires schema bump on finds (source_class field)

Estimated Effort: Low (1 hour)

Option B: Drop threshold to 70, keep official-only

Approach: Just lower TIER1_THRESHOLD from 80 → 70.

Pros: One-line change. Cons: Doesn't actually solve the X-thread problem (X is still not official-domain). Just lets borderline official-source finds through. Likely raises false positives without surfacing the highest-signal source.

Estimated Effort: Trivial

Option C: Manual `/promote F-XXXX` Telegram command

Approach: Bot command that promotes a single find to Tier 1 eligibility regardless of source.

Pros: Explicit, surgical. Cons: Adds another manual step every Mon/Thu — defeats the "auto" in auto-apply. Better used as escape hatch for Option A's misses.

Estimated Effort: Low

Chosen Approach

Decision: Option 1 + Option A. Optionally layer Option C later as escape hatch.

Rationale:

Friction is the actual enemy. The reason feedback.jsonl has 1 row is that there's no zero-friction input surface. Inline keyboard buttons in Telegram are the only mobile UI that survives the "watching TV on couch" test.
Cole-curated source class encodes a real signal. When Cole queues an X thread, he's already done provenance + relevance filtering by hand. The gate should reflect that. Treating his curated picks identically to a random X scrape is what's blocking Tier 1 from firing.
Both changes are reversible. Keyboard buttons additive — they don't change text-message flow. Gate change is one conditional, kill-switched via tier1_apply.py --strict flag.
Loop completeness > feature breadth. Cole's question is "is this robust?" Building #1+#2 closes implement and eval loops simultaneously. Other proposed upgrades (#3-#5 from chat: tier1 log, /approvals, /status) all assume these two work first.

Trade-offs Accepted:

Slightly larger blast radius for Tier 1 auto-edits (mitigated by git commit per run + tier1_log.jsonl review trail).
Bot now does both text-chat AND callback handling — modest complexity increase in the long-poll loop.
Caleb's verdicts logged but not acted on (separate verdict file) — adds future work if/when his judgment should count.

Implementation Plan

Phase 1: Tier 1 Gate Relax (the unblock)

File: ~/ai-projects-local/mission-control/scripts/ai_intelligence/score.py

☐ Add source_class field when canonicalizing: "cole_curated" if URL appears in any browser-agent/threads/pending/ or done/ .md file, else "official" if is_official(source), else "community".
☐ Persist source_class in finds.jsonl rows.

File: ~/ai-projects-local/mission-control/scripts/ai_intelligence/tier1_apply.py

☐ Add CLI flag --strict to fall back to old gate (official + 80) for rollback.
☐ Update gate: (source_class in ("official", "cole_curated")) AND score >= 75.
☐ Add --dry-run output that prints which finds would now pass vs old gate (diff for sanity check).
☐ Bump TIER1_LOG schema to include source_class per applied entry.

Verification before merge:

☐ Run tier1_apply.py --dry-run against current finds.jsonl — should show ≥1 cole_curated find newly eligible.
☐ Run with --strict — should match current "no eligible finds" behavior.

Phase 2: Telegram Reaction Loop

File: ~/ai-projects-local/mission-control/scripts/ai_intelligence/digest.py

☐ After digest send, call new function send_digest_cards(top_finds) that sends N (default 5, configurable) inline-keyboard messages — one per find — with buttons: ✅ Done, 🚀 Dispatch, 📌 Later, 🗑 Dismiss.
☐ Each button's callback_data encodes verdict:find_id (under 64 bytes — find_ids are F-0617-3-style, fits easily).
☐ Card text includes: title, source, score, 1-line summary, link.
☐ Wrap in feature flag env var TELEGRAM_DIGEST_CARDS=1 so we can disable without code revert.

File: ~/ai-projects-local/mission-control/scripts/telegram_watcher.py

☐ In getUpdates loop, branch on callback_query in addition to message.
☐ Add handle_callback_query(update):
Parse verdict:find_id from callback_data.
Permission check — only admin tier can dispatch; operator verdicts logged separately to feedback_operator.jsonl.
For verdict = dispatch: shell out to python3 ai_intelligence/feedback.py --dispatch F-XXXX-N.
For verdict in (done, later, dismiss): append row to feedback.jsonl via shared helper.
Answer the callback with answerCallbackQuery (toast: "Logged: ✅ done").
Edit the original message to strike-through the buttons (reply_markup: empty) so cards don't accept re-clicks.
☐ Save callback_query.message.message_id keyed by find_id in state file so we can edit later if needed.

File: ~/ai-projects-local/mission-control/scripts/ai_intelligence/feedback.py

☐ Expose a log_verdict(find_id, verdict, source="telegram", user="cole") helper that telegram_watcher can import (not just call via subprocess) — avoids spawning a python process per tap.

Phase 3: Observability + Rollout

☐ Add tier1_log.jsonl viewer: simple cat | jq recipe in PLAN.md / README.md so Cole can inspect from phone-friendly /tier1 log later (Phase 4 if/when built).
☐ Add one-line stat to next digest: "Last 7d: X auto-applied, Y verdicts logged" — proves the loop is closing.
☐ Update agent-council.md after first successful auto-apply with the find_id + commit SHA.
☐ Run digest manually with cards enabled, confirm Cole gets 5 buttons that respond on tap.
☐ Enable on Mon/Thu schedule after one clean manual run.

Phase 4 (deferred — explicit non-goals here)

/tier1 log, /approvals, /approve N, /reject N, /status — covered in chat scope but separate plan after #1+#2 prove out.
Voice → DISPATCH task — separate plan.
/promote F-XXXX escape hatch — only if Option A misses surface in week 1.

Acceptance Criteria

Must Have:

☐ Tier 1 gate accepts cole_curated source class; --dry-run shows ≥1 newly eligible find from current finds.jsonl.
☐ One full scrape cycle results in ≥1 entry in tier1_log.jsonl (i.e. it actually fires).
☐ One Mon/Thu digest produces N inline-keyboard cards in Telegram.
☐ Tapping any of (Done, Later, Dismiss) writes a row to feedback.jsonl with correct find_id, verdict, ts, user.
☐ Tapping Dispatch creates an MC daemon task via existing feedback.py --dispatch code path.
☐ Operator (Caleb) taps log to feedback_operator.jsonl, not feedback.jsonl.
☐ Buttons strike through after tap (no double-vote).
☐ Feature flag TELEGRAM_DIGEST_CARDS=0 reverts to old text-only digest cleanly.

Should Have:

☐ First post-build digest shows "Last 7d: X auto-applied, Y verdicts logged" footer.
☐ Each auto-applied Tier 1 edit lands as a single commit in ~/ai-projects-local/docs (audit trail).

Nice to Have:

☐ Cards include score delta vs previous digest (trend signal).
☐ After 3 consecutive "Dismiss" verdicts on same source, source is auto-demoted in scoring.

Verification Steps

Tier 1 dry-run sanity check: python3 ~/ai-projects-local/mission-control/scripts/ai_intelligence/tier1_apply.py --dry-run python3 ~/ai-projects-local/mission-control/scripts/ai_intelligence/tier1_apply.py --dry-run --strict Confirm new gate surfaces cole_curated finds; strict matches today's "no eligible finds".
Reaction smoke test (one find, one user): - Manually invoke digest.py with TELEGRAM_DIGEST_CARDS=1 and top-K=1. - Receive card on Telegram, tap each button (run 4 times with 4 different finds), verify each writes to feedback.jsonl. - Confirm callback answer toast appears within 1s.
End-to-end real run: - Wait for next Mon/Thu digest (or trigger manually). - Confirm cards arrive, react to all 5, check feedback.jsonl has 5 fresh rows. - Confirm one Tier 1 auto-apply happens within 24h (next scrape cycle picks up Cole's curated thread).
Rollback path: - export TELEGRAM_DIGEST_CARDS=0 reverts digest to text-only. - --strict flag on tier1_apply reverts to old gate. - Both should leave feedback.jsonl and tier1_log.jsonl untouched.

Execution Log

2026-06-17 — Planning

Plan drafted from conversation audit. Source-of-truth findings:

tier1_apply.py last logged tier1: no eligible finds at 2026-06-16 06:44.
feedback.jsonl = 1 row (2026-06-12), tier1_log.jsonl doesn't exist.
Telegram bot has 0 callback_query support today.

Next steps: Opus plan review (per CLAUDE.md), then /grill-me, then execute Phase 1 (Tier 1 gate) before Phase 2 (Telegram cards) — gate change is lower-risk and unblocks measurable Tier 1 activity even before cards ship.

Lessons Learned

[Add after completion]

What Worked:

What Didn't:

Next Time:

References

~/ai-projects-local/mission-control/scripts/ai_intelligence/ — new AI intelligence flow (built 2026-06-12)
~/ai-projects-local/mission-control/scripts/telegram_watcher.py — main bot, target for callback_query branch
~/ai-projects-local/mission-control/scripts/ai_intelligence/PLAN.md — original plan of record for the intelligence flow
~/ai-projects/mission-control/data/ai-intelligence/ — runtime data (finds, feedback, state)
Conversation 2026-06-17 (this session) — audit that revealed implement + eval loops dormant
Telegram Bot API → sendMessage reply_markup.inline_keyboard, callback_query, answerCallbackQuery

Source: ~/ai-projects/mission-control/plans/telegram-digest-loop-closer.md