Ready Created 2026-06-01 5/54 tasks

Productized Re-Engagement Agent — Phase 0 Brief

Priority: P1 Category: PREP (strategic — internal pilot before any external sale)

Executive Summary

Build a named "lapsed customer re-engagement" agent on top of Mission Control's existing brain. Pilot on Cole's three owned Shopify stores (Fabric Outlet, Shiplap, Sundance) for 4 weeks. If it clears $8,000/mo in incremental revenue (holdout-attributed, summed across stores), productize it as the wedge offering of the company-brain platform. If not, kill or pivot the wedge.

Research Phase

Current State

Mission Control already operates as a partial company brain: flat-file memory (MEMORY.md, lessons.md, shared-knowledge/), SQLite for tasks, MySQL on per-store WarehouseAPI servers for Shopify data, multiple Skills + subagents.
Three owned Shopify stores in scope (Dreamy is a 4th, currently 401'd on MCP — folds in once token's refreshed).
All three stores have clean 2+ year purchase history, properly linked customer→order data, and emails on 83-99% of customers.

Market/Industry Context

Eric Siu thread (2026-05-30, in conversation log): enterprises pay for "managed revenue agents" (named products) on top of a unified company brain (Qdrant + RAG). The brain is the moat; the agents are the wedge. Pattern matches what Mission Control is already 70% built toward.

Corey Ganim thread (2026-05-30): the durable AI-services pitch is revenue lift, not time saved. "Hours saved has a ceiling. Revenue added doesn't." Speed-to-lead and follow-up automation are the highest-ROI plays.

Industry win-back benchmarks: $0.50-$2 per addressable contact per month for mature, personalized win-back programs. 3-8% of total revenue from re-engagement flows is typical for stores doing this well.

Historical Context

Mission Control has prior precedent for AI-driven outreach: focus group personas, Klaviyo flow library, Lori Holt collection reports. No prior production agent has been run as a closed-loop revenue agent with attribution.

Constraints

Marketing consent is the binding constraint, not the LLM or the agent design. Per-store accepts-marketing rates: FO 6.5%, Shiplap 33.7%, Sundance 5.5%. Real addressable pool of >180-day lapsed = ~6,700 across 3 stores (~6,300 of which is Shiplap).
Cole is the sole operator; build must be background-Claude-friendly.
No external customer in scope yet — internal validation only.
Holdout test required for credible attribution; can't just count UTM clicks.

Options Considered

Option 1: Build internally first, productize after validation (CHOSEN)

Approach: Build the agent on Cole's three stores; measure incremental revenue via holdout; productize only if it clears the bar. Pros: No external risk during validation. Cole's own P&L is the test bed. If it works, the case study writes itself. Cons: Adds ~6-8 weeks before any revenue from external sales. Estimated Effort: Medium (3-5 days plumbing + 4 weeks measurement)

Option 2: Build for external customer from day one

Approach: Scope for an RBD wholesale customer first; deliver as managed service. Pros: Faster path to first dollar; forces multi-tenant architecture upfront. Cons: Selling something unproven. Multi-tenant brain is the highest-risk architectural piece (data leakage between tenants = company-ending event). Should never ship to a paying customer before internal validation. Estimated Effort: High

Option 3: Skip the agent, just sell the brain

Approach: License Mission Control's brain as a horizontal product, let customers build their own agents on top. Pros: Higher ceiling, smaller surface area for us. Cons: Eric Siu thread is explicit: enterprises want done-for-them, not toolkits. Brain-only is the harder sell. Estimated Effort: Low (no agent code) but Low conversion likelihood.

Chosen Approach

Decision: Option 1 — internal pilot first, then productize.

Rationale:

Derisk before selling. Cole's three stores are the cheapest test bed — no contract risk, full data access, no SLA pressure.
Multi-tenant brain is the highest-risk piece and we don't have to solve it until after we've proven the agent. Defer the hard architectural work to Phase 2 (productization), not Phase 1 (validation).
Shiplap alone is enough signal. With 18,746 lapsed >180d and ~6,317 opted-in addressable, Shiplap can deliver the full $8k/mo if the agent works at industry-benchmark conversion rates.
The brain + harness already exists. This phase is mostly: write a good agent prompt, plumb the data, run the holdout. Not a greenfield build.

Trade-offs Accepted:

~6-8 weeks before any external revenue from this initiative.
We're not solving multi-tenancy yet, so we can't onboard a paying customer immediately even if Phase 1 succeeds.
Sundance gets a thinner variant — it's a service business, not retail, so the agent prompt and signals differ.

Locked Decisions (Phase 0)

Decision	Value
First customer	Internal: Fabric Outlet + Shiplap only. Sundance dropped from Phase 1 2026-06-01 (no Klaviyo API key, no klaviyo_profiles table, sparse product tags, marginal ~$160/mo projected contribution). Sundance + Dreamy both fold in at Phase 2.
Wedge agent	Lapsed customer re-engagement
Variants	Two: fabric retail (FO + Shiplap, product-taste personalization) and service repeat (Sundance, temporal nudge)
Lapsed window	>180 days since last purchase
Channel	Email only for Phase 1 (SMS deferred to Phase 2 if email clears the bar)
Attribution	Holdout test — random split in our code, send via Klaviyo API
Addressable-pool source	Klaviyo profile subscription status (live API) — NOT `shopify_customers.accepts_marketing`. Resolved 2026-06-01 after Pre-flight 2 showed the Shopify field is sparse/stale (0 reactivations across 7,191 opted-in customers = broken data). Klaviyo is live consent, what email sends actually respect, and what compliance requires.
Split ratio	50/50 treatment/control (per Opus critic — measurement is the primary goal of the pilot, not revenue maximization; 80/20 is statistically underpowered at this sample size)
Attribution window	Any order placed within 30 days of customer's first agent email counts as treatment; same 30-day window post-pseudo-send-date for control. Day-60 is a secondary readout.
Success threshold	Two-tier: $5k/mo stretch, $3k/mo floor, both summed across FO + Shiplap, attributable via holdout. Revised 2026-06-01 after Pre-flight 3 returned real Klaviyo audience: FO 1,225 + Shiplap 3,486 = 4,711 addressable (vs Phase 0 estimate of 6,601). At +2pp lift × real AOVs ($60.33 FO / $72.14 Shiplap) → projected ~$3,254/mo. $3k floor = green light with caveats. $5k stretch = strong signal, productize. <$3k = pivot wedge.
Statistical bar	Two-proportion z-test, p < 0.05, between treatment and control conversion rates. If p > 0.05, do not claim lift regardless of dollar number.
Effort split	~60/30/10 Shiplap/FO/Sundance (proportional to addressable audience)
Builder	Cole + background Claude session
Start	This week, in parallel with this brief

Threshold reality check

Per the Opus critic, generic industry win-back benchmarks ($0.50-$2 per addressable contact) are for 60-90 day dormant audiences. We're targeting 180+ day lapsed, where realistic reactivation rates are 1-3%, not 5-10%.

Back-of-envelope at 2% reactivation lift over baseline:

Shiplap: ~6,317 addressable × 50% treatment = ~3,160 contacts × 2% × $85 AOV ≈ $5,370/mo
FO: ~284 × 50% × 2% × $55 AOV ≈ $160/mo
Sundance: ~97 × 50% × 2% × $165 AOV ≈ $160/mo
Total expected at realistic assumptions: ~$5,700/mo

This is below the $8k threshold. To hit $8k we need EITHER:

Reactivation lift of ~3% rather than 2% (above industry mid-band for cold audiences), OR
Better AOV than historical (likely — agent-personalized recs may surface higher-ticket items), OR
Extend pilot to 8 weeks (doubles sends, halves the bar per week)

Decision deferred to Week 2 of build: ~~after baseline organic reactivation rate is pulled (see Week 1 task), we revisit whether $8k is the right bar or whether $5k is the honest version.~~ Resolved 2026-06-01: recalibrated to $5k/mo after Pre-flight 2 returned measured baselines. See updated Success threshold row above.

Addressable audience (verified 2026-06-01 from WarehouseAPI MySQL)

Original Phase 0 numbers (from Shopify accepts_marketing, now known to be broken — kept for reference):

Store	Lapsed >180d	Accepts marketing %	Addressable lapsed >180d
Fabric Outlet	4,371	6.5%	~284
Shiplap	18,746	33.7%	~6,317
Sundance	1,756	5.5%	~97
Total	24,873	—	~6,700

Authoritative numbers (Pre-flight 3, Klaviyo profile subscription = truth source):

Store	Klaviyo subscribed	Matched in Shopify	Addressable lapsed-180d	12-mo AOV
Fabric Outlet	6,677	2,489 (37%)	1,225	$60.33
Shiplap	9,779	4,798 (49%)	3,486	$72.14
Sundance	n/a (no Klaviyo)	—	n/a — DECISION_NEEDED	$178.43
Total (FO+SHP)	16,456	7,287	4,711	$69.07 (weighted)

Implementation Plan

Phase 1: Internal pilot (4 weeks)

Week 1 — Pre-flight (CRITICAL — do before any prompt writing)

☐ Klaviyo flow audit + suppression. Per klaviyo-flow-library.md, a Win-Back Flow for 60-90 day dormant is designed (PREP status) for FO and Shiplap. Verify whether it is currently live; if so, suppress it for the pilot duration OR document why holdout integrity is still valid. Also check welcome / browse abandonment / back-in-stock flows for overlap. No segment build until this is resolved — overlapping flows contaminate the holdout.
☐ Baseline organic reactivation rate. Pull last 90 days of WarehouseAPI MySQL: how many lapsed-180d customers placed an order without being emailed? Per-store: spontaneous reactivation count / lapsed-180d population. This is the control's expected base rate. Without it we can't validate that the control group is behaving normally during the pilot.
☐ WarehouseAPI data quality pre-check. Verify per-customer history fields are populated and clean: products, collections, designers, colorways, AOV, last order date. Spot-check 25 random customers per store. Note any sync gaps (last sync time, any null clusters). Cleanly failing data here kills the prompt before we waste time on it.
☐ Smart Sending audit. Confirm Klaviyo's send rate caps (default 2/day, 6/week). Verify our send cadence won't be silently deprioritized by other flows competing for the same profiles.

Week 1 — Data + splitter (after pre-flight passes)

☐ Pull lapsed >180d customer list from each WarehouseAPI MySQL (FO, Shiplap). Sundance handled separately (service variant). Include: customer_id, email, last 12 months of order history (products, collections, designers, colorways, AOV, days since last order).
☐ Deterministic random splitter in Python (seeded for reproducibility): 50/50 treatment/control. Tag each customer in Klaviyo with rea_treatment / rea_control flag.
☐ Filter strictly to accepts_marketing = 1 — never email opt-outs.
☐ Pre-send segment freeze: document exact timestamp the split was committed; do not re-evaluate the segment between freeze and send. Klaviyo segments are dynamic — profile-level rejoins/exits between tag and send corrupt the holdout.
☐ Per-store dry-run: dump the lists to CSV, eyeball-verify the splitter math and the data quality.

Week 1-2 — Agent prompts + tools

☐ Fabric retail agent (FO + Shiplap):
Inputs: customer purchase history (collections, designers, colorways, recent SKUs viewed if available), live inventory check at send time (see below), current promotional context.
Output: personalized subject line + email body referencing the customer's actual history, recommending 1-3 SKUs from current inventory that match their taste profile.
Live inventory mitigation: WarehouseAPI is batch sync, not real-time. Before each email is queued for send, agent must re-query Shopify Storefront/Admin API for stock on the recommended SKUs. If any recommended SKU is out of stock, agent picks a substitute from the same collection or kills the recommendation. Recommending OOS SKUs = unsubscribe spike.
☐ Service agent (Sundance):
Inputs: customer's quilting service history (number of quilts, last quilt completion date, average days between quilts).
Output: temporal nudge — "your last quilt finished N months ago, average customer like you sends another in M months; ready?"
☐ Eval rubric per agent (accuracy/completeness/format/safety — extend daemon/rubrics/research.md pattern).

Week 2 — Klaviyo API integration

☐ Tag treatment/control segments in Klaviyo via API.
☐ Send agent emails to treatment via Klaviyo Campaigns API (one campaign per store).
☐ Hold control out — explicitly do not send.
☐ Stagger sends across the 4-week pilot (don't burn the audience in one batch).

Week 2-4 — Run + measure

☐ Run the agent against the addressable pool, ~25% of the pool per week (4 cohorts).
☐ Daily dashboard: emails sent, emails delivered (% deliverability), opens, clicks, attributed orders (treatment + control), incremental revenue running total, running p-value from two-proportion z-test.
☐ At day 30 post-first-send, compute incremental revenue: (treatment_conv_rate − control_conv_rate) × AOV × treatment_pop. Window: any order within 30 days of the customer's individual first-send (rolling per-customer window, NOT a fixed calendar window).
☐ Day-60 measurement (post-pilot): background Claude session must self-schedule a Day-60 readout — a follow-up job that fires 30 days after pilot end (8 weeks from pilot start). Owner: same background session. Trigger: cron entry created at pilot kickoff. Without this, the Day-60 number is a "Must Have" criterion that never gets computed.

Phase gate (end of week 4):

☐ Pass: ≥$8,000/mo incremental revenue summed across stores AND p<0.05 → green light Phase 2 (multi-tenant brain). (Threshold may revise to $5k after Week 1 baseline reactivation pull — see "Threshold reality check" above.)
☐ Marginal ($4-8k/mo OR statistically significant but small): re-evaluate — either adjust prompt + extend pilot 4 weeks, or pivot to a different wedge.
☐ Fail (<$4k/mo OR p>0.05): kill this wedge, return to Phase 0 with a different wedge candidate (abandoned cart, speed-to-lead, etc.).

Phase 2 (gated on Phase 1 pass) — Multi-tenant brain + productization

Deferred until Phase 1 passes. High-level: tenant isolation, Qdrant or pgvector retrieval layer, customer onboarding flow, Stripe billing, landing page. Estimated 6-10 weeks.

Acceptance Criteria (Phase 1)

Must Have:

☐ Pre-flight passed (Klaviyo flow suppression confirmed, baseline organic reactivation rate pulled, data quality pre-check signed off, Smart Sending caps verified)
☐ Lapsed-180d list pulled from FO + Shiplap (Sundance handled in service variant), validated against accepts_marketing = 1
☐ Deterministic 50/50 splitter written, tested, seeded, committed
☐ Pre-send segment freeze timestamp documented (Klaviyo segments are dynamic; freeze + send window must be tight)
☐ Treatment/control customers tagged in Klaviyo
☐ Both agent variants (fabric + service) producing personalized emails that reference real customer history
☐ Live inventory check at send time for fabric agent (no recommending OOS SKUs)
☐ Daily dashboard showing: sends, delivered % per store (must be >90% to claim any attribution), opens, clicks, orders attributed to treatment vs control, incremental revenue running total, running p-value
☐ Day-30 incremental revenue computation, per-store and total, with two-proportion z-test
☐ Day-60 measurement scheduled via cron at pilot kickoff (not just "must have" — must have an actual trigger)
☐ Statistical significance gate: if p > 0.05 at Day-30, do not claim lift regardless of dollar number
☐ Holdout integrity: no contact to control group, no contamination, suppressed flows verified

Should Have:

☐ Eval rubric scoring each generated email before send (subject quality, personalization depth, recommendation fit)
☐ Per-store agent prompt tuning based on early signal (week 2 onward)
☐ Klaviyo deliverability monitoring (open rate >25%, complaint rate <0.1%, bounce rate <2%)
☐ Minimum detectable effect (MDE) calculation at pilot start — what's the smallest lift we can detect at 80% power, p<0.05? If MDE > our threshold, the pilot can pass without confidence and we should know it.

Nice to Have:

☐ SMS variant for the Shiplap subset that has SMS consent (probably <500 contacts; deprioritized)
☐ A/B subject line testing within the treatment group
☐ Sundance variant catches and routes "I already sent another quilt" replies to a human inbox

Verification Steps

Splitter audit before any send: dump treatment + control CSVs, manually inspect first 50 of each, confirm random distribution (no time-skew, store-skew, AOV-skew).
First send dry-run: generate 10 emails per store via the agent, eyeball every one for hallucinations, wrong product references, broken merge tags.
Klaviyo holdout check: confirm that control group is tagged as "do not send" and that the campaign send was bounded to treatment only. Verify via Klaviyo segment counts before clicking send.
Attribution math review: before publishing the "$X recovered" headline, walk Cole through the calculation: control conversion rate, treatment conversion rate, AOV, addressable population. Should be auditable in one pass.
Run report-verification-checklist before any external claim of revenue figures.

Data integrity: This pilot generates real revenue numbers Cole will use to make a productization decision. NEVER claim revenue figures without showing the underlying holdout math. If the holdout signal is too noisy to be statistically meaningful at end of pilot, say so explicitly rather than rounding up.

Risks

Reactivation rate may be lower than industry benchmarks suggest. Generic win-back benchmarks ($0.50-$2/contact) are for 60-90 day dormant audiences. At 180+ days, realistic reactivation is 1-3%, not 5-10%. Our $8k threshold is barely achievable at the high end of these realistic assumptions. The Week-1 baseline organic reactivation pull resolves this — if organic rate is near zero, our lift bar is easier; if it's already 3%, we need to clear ~5% to show meaningful lift.
Klaviyo flow cannibalization. The Win-Back Flow (60-90 day dormant) per klaviyo-flow-library.md may already be designed/live for FO + Shiplap. If it fires during our pilot it contaminates the holdout. Mitigated via mandatory Week-1 suppression check but the risk is "we missed a flow."
Smart Sending caps silently dropping treatment sends. Klaviyo's default 2/day, 6/week caps mean the agent's email may be deprioritized if a profile is already getting other automated mail. Detection: per-profile delivery audit post-send.
Live inventory drift. WarehouseAPI is batch sync; agent must hit Shopify's live API at send time for stock check on recommended SKUs. Failure to do so = OOS recommendations = unsubscribe spike.
Klaviyo API rate limits — sends staggered to stay under limits; Klaviyo allows ~10 req/s on Campaigns API.
Email reputation hit if open rates tank — first send is a fresh segment that hasn't received email in 180+ days; engagement may be low and Gmail may downgrade. Mitigation: warm up by starting with the 180-365d cohort (more recent), not the 2-year-lapsed cohort.
Sundance pool is tiny (~97 addressable) — statistical noise will swamp signal. Treat Sundance as exploratory, not contributing meaningfully to the $8k bar. Likely contribution: $0-300/mo, not the $1k estimated initially.
Holdout cannibalization — control group might naturally come back; this is exactly what we want to MEASURE, not avoid. Noisy control means we need bigger sample to detect lift. 50/50 split helps here.
Cole's calendar — background Claude session can build autonomously but decisions (prompt tuning, kill/proceed at week 2 checkpoint, threshold revision at Week 2) need Cole. Risk if Cole is unavailable for >3 days during week 2-3.
Day-60 measurement gets dropped. Day-60 lands 2 weeks after pilot end. If no cron is scheduled at pilot kickoff, the metric never gets pulled and we make a productization decision on Day-30 data alone (less robust).

Open Items

☐ COLE ACTION ITEM: Pause Klaviyo campaigns targeting lapsed-180d for the 4-week pilot window. Must complete in Klaviyo admin before pilot Week 1 Data step. Bg session cannot touch Klaviyo. Blocks pilot kickoff.
☑ Klaviyo Smart Sending = ON (Cole confirmed 2026-06-01 21:15). Per PF4 mitigation, pilot send code logs per-recipient succeeded/skipped; skipped profiles removed from analytic cohort.
☐ COLE DECISION_NEEDED: Sundance handling for the pilot. No Klaviyo API key, no klaviyo_profiles table on Sundance WarehouseAPI, sparse product tags. Bg session recommendation: drop from Phase 1, fold into Phase 2 alongside Dreamy. Statistical contribution at +2pp lift would be ~$160/mo even with Klaviyo. Alternative: set up Sundance Klaviyo first (delays pilot 1-2 weeks).
☐ COLE ACTION ITEM: Confirm Smart Sending default in Klaviyo admin UI for FO + Shiplap. API doesn't expose the account-level toggle. Per PF4, our cadence is safe either way; documenting is just for completeness.
☐ Dreamy MCP token — Cole regenerating in Shopify admin; once updated in ~/.secrets/shopify-tokens.env, fold into Phase 2, NOT Phase 1. Mid-pilot additions split attribution focus and dilute the signal we're paying for.
☑ Opus plan review — completed 2026-06-01, fixes applied to plan.
☑ Threshold reconsideration — resolved 2026-06-01 after Pre-flight 2 measured baselines. Recalibrated $8k → $5k/mo.
☑ Addressable-pool source decision — resolved 2026-06-01. Klaviyo subscription status is source of truth.
☐ Brand voice per store — does each store need distinct tone, or unified? Deferred to Week 1 of build.
☑ MDE calculation — completed 2026-06-01 (PF3+MDE doc). Pooled MDE = +1.30pp / ~$2.1k/mo. $5k threshold requires +3.07pp lift (2.4× the statistical floor). Documented in reengagement-mde-calc.md.
☐ Personalization data source pivot. Phase 0 assumed shopify_products.collection/.designer/.color columns. PF3 found those columns DO NOT exist on FO + Shiplap WarehouseAPI servers (only on RBD). Replacement: parse shopify_products.tags CSV in agent prompt. No production deploy needed; minor prompt-design change. Move from Open Items to "Agent prompts + tools" in Week 1-2.
☐ Agent customer-history window. Change from "last 12 months" to "last 5 orders ever" or "last 3 years, whichever covers more orders." Per PF3, 52% of FO / 72% of Shiplap addressable have no orders in last 12 mo — but their tag-personalization signal in older orders is still clean.

Execution Log

2026-06-01 21:30 UTC — Week 1 build COMPLETE (bg-reengagement, session 3)

All six Week 1 deliverables landed. Pipeline ready for Cole's trigger.

D1 audience pull (scripts/reengagement/audience_pull.py): re-runnable Klaviyo → MySQL pipeline; FO 1,225 / Shiplap 3,486 / total 4,711 addressable, exact match with PF3. Every customer has line items in the history window (last 5 orders OR last 3yr, broader). Tag signal rich (real designers Lori Holt, Echo Park Paper Co., Tula Pink; real collections Mon Cheri, Be Mine Valentine, Elmer & Eloise).
D2 splitter (scripts/reengagement/splitter.py): SHA-256(anchor|customer_id) deterministic, per-customer-stable. Actual arms FO 621T/604C (50.7%T), Shiplap 1789T/1697C (51.3%T). Freeze timestamp 2026-06-01T21:09:25Z recorded per row + in data/reengagement/split_audit.json — pre-send segment freeze acceptance criterion satisfied.
D3 Klaviyo tagger BUILD ONLY (scripts/reengagement/klaviyo_tag.py): bulk-import payload sets rea_pilot_arm + rea_treatment/rea_control booleans + audit anchor/freeze properties. 100% of split rows mapped to Klaviyo profile IDs (zero orphans). DRY-RUN default. --execute is Cole's go-command (idempotent re-upsert).
D4 agent prompt + 20 dry-run samples (scripts/reengagement/agent_prompt_fabric.md, agent_helpers.py, generate_samples.py): prompt encodes 10 hard rules (no fabrication, real-history anchor required, ±50% price band, no first-name fabrication). Helpers do deterministic tag parsing (curated designer list, stopwords, sale-tag regex, product-match scorer). Generator pulls 750 in-stock products per store from Shopify, calls Sonnet 4.6, validates outputs. 20/20 samples generated, 19 fully validation-clean (one preheader 1 char over my soft 110 cap, no factual errors). Samples for Cole: plans/vscode/reengagement-week1-sample-emails.md. Quality: agent references real past designer+collection+colorway, recommends in-stock SKUs, no fabricated dates.
D5 eval rubric (daemon/rubrics/reengagement_email.md): five 1–5 dimensions (accuracy, personalization_depth, recommendation_fit, safety, format), auto-checks defined, aggregate gate (all 5s/≥4 = send; any 3 = regen-then-drop; any ≤2 = drop+log; halt store if drop rate >5%). Judge model: claude-sonnet-4-6.
D6 MDE on actual splitter arms (plans/vscode/reengagement-week1-mde.md): FO $1,076/mo MDE (+2.95pp), Shiplap $1,768/mo (+1.44pp), pooled $2,093/mo (+1.32pp). Anomaly check (MDE > $3k floor): NOT TRIGGERED. Required lift: +1.89pp for $3k floor, +3.15pp for $5k stretch. Both gates testable cleanly.

Bg session stopping. Waiting for Cole to: (a) confirm Smart Sending, (b) confirm campaign pause, (c) eyeball samples, (d) greenlight Week 2 (Klaviyo write integration + send kickoff).

2026-06-01 21:30 UTC — Pre-flight 3, 4, MDE complete (bg-reengagement)

Pre-flight 3 — data quality: Klaviyo subscribers enumerated client-side (filter API doesn't support consent field). FO 6,677 subscribed (of 26,648 profiles); Shiplap 9,779 (of 68,137); Sundance n/a (no Klaviyo API key). Cross-referenced against per-store WarehouseAPI MySQL. Addressable lapsed-180d: FO 1,225 / Shiplap 3,486. Materially different from Phase 0 estimates (FO 284 / Shiplap 6,317) — confirms locked decision to use Klaviyo as truth source.
CRITICAL finding (PF3): shopify_products.collection/.designer/.color/.theme columns DO NOT exist on FO + Shiplap WarehouseAPI servers — the metafields migration was only deployed to RBD (rb.alpineanalytica.com). Personalization signal is fully available via shopify_products.tags (CSV-encoded with designer + collection + colorway names). Coverage: 97.6% / 99.5% populated. Agent prompt must parse tags client-side; no production deploy required, but Phase 0 assumption needs updating.
Finding (PF3): History sparsity — 52% of FO addressable / 26% of Shiplap addressable have orders in last 12 months. Data quality of populated rows is 100% (fields always present when row exists). Implication: agent's history input window should be "last 5 orders ever" not "last 12 months."
AOV updates (PF3, 12-mo lapsed-addressable): FO $60.33 (was est. $55), Shiplap $72.14 (was est. $85). These supersede prior estimates.
Sundance status (PF3): DECISION_NEEDED — no Klaviyo API key, no klaviyo_profiles table on Sundance MySQL, sparse product tags. Three options: drop from pilot (default rec), set up Klaviyo first (delays pilot 1-2 wks), or use Shopify accepts_marketing (broken per PF2). Bg session recommendation = drop from Phase 1.
Pre-flight 4 — Smart Sending: Klaviyo API does NOT expose Smart Sending at the account level; it's per-flow / per-campaign. Default is on for 2024+ accounts. Our 1-email-per-profile pilot cadence is far below any reasonable cap. Risk: Smart Sending may silently skip ~1-3% of treatment arm if profiles overlap with Abandoned Cart / Back-in-Stock within 16h. Mitigation: log Klaviyo per-recipient delivery status, exclude skipped profiles from analytic cohort (Week-2 integration task). Do NOT disable Smart Sending account-wide. Cole to confirm in admin UI before kickoff.
MDE calc: Pooled MDE (FO+Shiplap, n=2,355 per arm, weighted p1=2.61%) = +1.30pp absolute → ~$2,100/mo. Required lift to clear $5k threshold = +3.07pp. Gap: 2.4× the statistical floor. Pilot CAN detect significance below the dollar threshold — this is honest and expected; both gates (p<0.05 AND ≥$5k) must be hit to claim pass. AOV uplift via premium-SKU recommendations could close the gap at +2.5pp lift if AOV reaches $85.
Outputs: ~/ai-projects/mission-control/plans/vscode/reengagement-preflight-3-quality.md, reengagement-preflight-4-smartsending.md, reengagement-mde-calc.md.
Pre-flight complete. Bg session stopping. Waiting for Cole greenlight on Week 1 Data step.

2026-06-01 — Phase 0 locked

Walked through customer, wedge, lapsed window, channel, attribution methodology, success threshold, builder, start timing.
Pulled real revenue (last 4 days: $19,992 across 3 stores, ~$150k/mo run rate).
Verified data depth in WarehouseAPI MySQL on all 3 stores.
Identified marketing-consent constraint as the binding limit on addressable pool.
Brief written.

2026-06-01 20:15 UTC — bg-reengagement HALTED at gate

Per the bg session prompt rule ("Do NOT continue to the next pre-flight task — they gate each other"), stopping after Pre-flight 2 until Cole resolves the open DECISION_NEEDED.
Status: Pre-flights 1 & 2 done. Pre-flights 3 & 4 pending.
Cole's next-session input needed: confirm whether the splitter's addressable-pool filter sources from Klaviyo profile subscription status (recommended default) or Shopify accepts_marketing. This affects how Pre-flight 3 should scope its sampling frame.
On resume, this bg session continues with PF3 + PF4 in one batch.

2026-06-01 20:10 UTC — Pre-flight 2 (baseline reactivation) — bg-reengagement

Queried per-store WarehouseAPI MySQL (FO, Shiplap, Sundance), read-only. T=2026-06-01, window=last 90 days.
Pool A (all lapsed-180d, regardless of consent):
FO: 3,362 → 117 reactivated → 3.48 %, AOV $56.63
Shiplap: 17,419 → 403 reactivated → 2.31 %, AOV $69.87
Sundance: 1,603 → 98 reactivated → 6.11 %, AOV $178.43
Pool B (Shopify accepts_marketing='1' subset): 0 reactivated across 7,191 customers in all three stores combined. This is a data-integrity finding — shopify_customers.accepts_marketing is sparsely populated and skews to a never-reactivating long-tail cohort. NOT a reliable addressable filter.
DECISION_NEEDED: confirm the splitter will source addressable pool from Klaviyo profile subscription status, not Shopify accepts_marketing. Default recommendation: Klaviyo as source of truth.
Threshold implication: at +1pp lift over Pool A baseline (more honest for cold 180+ day audience), expected revenue is ~$2.4K/mo. At +2pp lift, ~$4.7K/mo. $8K threshold remains aggressive. Threshold reconsideration already on Week-2 docket; this data argues for $5K honest version.
Full output: ~/ai-projects/mission-control/plans/vscode/reengagement-preflight-2-baseline.md. SQL: ~/.claude/jobs/c3793e2d/baseline_reactivation.sql.
Next: Pre-flight 3 (data quality spot-check).

2026-06-01 19:37 UTC — Pre-flight 1 (Klaviyo flow audit) — bg-reengagement

Enumerated all flows on FO (6: 3 live, 3 draft) and Shiplap (4: 2 live, 2 draft) via Klaviyo API (read-only).
No Win-Back flow exists on either store — designed in the flow library doc but never built or deployed. The "Klaviyo flow cannibalization" risk in the plan is resolved as a non-issue.
No flows need to be suppressed for the pilot. Live flows (Welcome Series, Abandoned Cart, Review Request, Back In Stock) trigger on events orthogonal to lapsed-180d status; they affect both treatment and control arms equally, so don't bias the holdout.
Open item flagged for Cole: confirm no win-back campaigns (as opposed to flows) are scheduled against a lapsed-180d segment during the 4-week pilot. Campaigns are out of scope of this flow-only audit.
Full output: ~/ai-projects/mission-control/plans/vscode/reengagement-preflight-1-klaviyo.md. Raw JSON: ~/.claude/jobs/c3793e2d/klaviyo_flows.json.
Next: Pre-flight 2 (baseline reactivation rate via WarehouseAPI MySQL).

2026-06-01 21:00 — Pre-flight 3 + 4 + MDE complete; Cole decisions locked

Pre-flight 3 (data quality with Klaviyo source): Real Klaviyo addressable lapsed-180d audience — FO 1,225 (vs Phase 0's 284, +331%) / Shiplap 3,486 (vs Phase 0's 6,317, −45%). Net total 4,711 (−29% vs Phase 0). AOVs revised: FO $60.33, Shiplap $72.14.
CRITICAL ANOMALY (resolved by bg session): shopify_products.collection|designer|color|theme columns DO NOT exist on FO + Shiplap WarehouseAPI servers (metafields migration only deployed to RBD). Workaround: tags column is 97-99% populated with same content (designer + collection + colorway as CSV). Agent prompt parses tags client-side. No production deploy needed.
History sparsity finding (resolved by bg session): 52% of FO lapsed and 72% of Shiplap lapsed customers have no orders in last 12 months. Prompt design uses "last 5 orders ever or all orders in last 3 years, whichever is more recent" instead of fixed 12-month window.
Pre-flight 4 (Smart Sending): Klaviyo's API does not expose account-level Smart Sending setting. Risk is low given our 1-email-per-profile cadence. Mitigation: pilot send code will log per-recipient succeeded/skipped status. Cole to confirm Smart Sending setting in Klaviyo admin UI before pilot kickoff.
Three Cole decisions locked: 1. Sundance dropped from Phase 1 (folds into Phase 2 with Dreamy). Marginal projected contribution + Klaviyo setup cost not worth 1-2 week delay. 2. Threshold tiered: $5k stretch / $3k floor. Floor reflects honest +2pp lift math on real audience; stretch keeps original recalibration as productize-strong-signal target. 3. Cole will check Smart Sending setting in Klaviyo admin and report back. Action item, does not block Week 1 build (only blocks sending).
Week 1 build (Data + splitter + agent prompts) unblocked. Launching as new background session.

2026-06-01 20:30 — Pre-flight 1 + 2 complete; key decisions resolved

Pre-flight 1 (Klaviyo flow audit): No Win-Back flow exists on FO or Shiplap. Doc said "PREP — Designs complete" but it was never built. Live flows (Welcome, Abandoned Cart, Review request, Back In Stock) have no holdout-contamination risk. Nothing needs suppressing. The Opus critic's #1 risk is resolved as a non-issue. Caveat: flows audited, not campaigns.
Pre-flight 2 (Baseline reactivation): Pool A (all lapsed-180d, regardless of consent) returned per-store baselines: FO 3.48% / Shiplap 2.31% / Sundance 6.11%. Pool B (Shopify accepts_marketing=1) returned 0/7,191 — broken data, the Shopify field is sparse and stale.
Three Cole decisions locked: 1. Addressable-pool source = Klaviyo profile subscription status (NOT Shopify accepts_marketing). Per Cole 2026-06-01. 2. Success threshold recalibrated $8k/mo → $5k/mo. Per Cole 2026-06-01. Honest version after measured baselines. +2pp lift target. 3. Cole will pause scheduled Klaviyo campaigns targeting lapsed-180d for the 4-week pilot window. Action item on Cole — bg session does not touch Klaviyo. Must complete before pilot kickoff (Week 1 Data step).
Pre-flight 3 + 4 (data quality + Smart Sending caps) unblocked; bg session relaunched with updated context.

2026-06-01 — Opus critic pass + fixes applied

Spawned Opus general-purpose critic on the brief. Verdict: ship with minor fixes.
Six structural changes applied: 1. Split ratio 80/20 → 50/50 (statistical power at this sample size requires it) 2. Dreamy explicitly excluded from Phase 1 (was "folds in once token refreshed"; now Phase 2 only) 3. Klaviyo flow audit + suppression added as Week-1 pre-flight (cannibalization risk from existing Win-Back flow) 4. Attribution window defined: 30-day rolling per-customer post-first-send 5. Day-60 measurement requires cron trigger at pilot kickoff (was a "must have" with no owner) 6. Baseline organic reactivation rate pulled in Week 1 (control base rate validation)
Two new risks surfaced and added: cold-audience reactivation math may be optimistic ($5.7k more realistic than $8k expected); live inventory drift requires send-time API check.
New acceptance criteria added: deliverability >90% per store, p<0.05 statistical gate, pre-send segment freeze documentation, MDE calculation.
Threshold reconsideration flagged as Week-2 decision (after baseline reactivation pull resolves the question).
Next: kick off background build session for Phase 1.

References

Eric Siu "company brain" X thread: https://x.com/ericosiu/status/2060720592475795693
Corey Ganim "revenue not time" X thread: https://x.com/coreyganim/status/2060829603166326941
Systematicls Hermes/OpenClaw thread (harness vs frontier model leverage): https://x.com/systematicls/status/2060675167152648270
Mission Control architecture: ~/ai-projects-local/mission-control/docs/architecture.md
WarehouseAPI doc: ~/ai-projects-local/shared-knowledge/warehouseapi.md
Klaviyo flow library: ~/ai-projects-local/shared-knowledge/klaviyo-flow-library.md
Planning workflow: ~/ai-projects/mission-control/docs/planning-workflow.md

Source: ~/ai-projects/mission-control/plans/productized-reengagement-agent.md