SKU-Level Cost Tracking Implementation
Priority: P2 Category: PREP
Executive Summary
Add product cost data to the Shopify SQLite database to enable true gross margin analysis. The existing pipeline tracks revenue (price, compare_at_price) but has zero cost visibility. Shopify natively supports cost-per-item on inventory items — we can pull existing cost data via API immediately, then build a vendor sheet import pipeline to fill gaps. POC covers the top 100 SKUs by revenue volume.
Research Phase
Current State
Database: ~/ai-projects-local/shopify-bigquery-pipeline/data/shopify.db
variantstable: hasprice,compare_at_price,inventory_item_id— no cost columnline_itemstable: hasprice,total_discount— no cost columnproductstable: hasvendor— useful for vendor-level cost assumptions
FO Store Inventory Snapshot (as of last sync):
- 2,244 active SKUs across 22 vendors
- $121,007 total retail inventory value
- Top vendors by retail value: Riley Blake Designs ($67K), FreeSpirit ($15K), Andover ($13K)
Shopify API Capability:
- Shopify stores
cost(wholesale/unit cost) on theInventoryItemobject - Already captured
inventory_item_idon every variant row — this is the join key - REST endpoint:
GET /admin/api/2024-01/inventory_items/{id}.json→costfield - GraphQL:
inventoryItem { unitCost { amount } }— supports bulk operations - Cost field is optional — many merchants leave it blank; we'll discover coverage rate in Phase 1
Landed Cost Components:
- Wholesale (unit cost from vendor invoice)
- Inbound freight (typically 3-8% of wholesale for fabric, varies by vendor/carrier)
- Import duties (varies by country of origin; most fabric from China at ~12.5% HTS 5208.xx)
- Payment processing is usually excluded from COGS (considered operating expense)
Constraints
- No cost data currently in DB — must either pull from Shopify API or import from vendor sheets
- Top 100 SKUs by what metric? Using revenue (units sold × price) from line_items is most meaningful, but if line_items history is shallow, use inventory value (qty × price) as proxy
- Vendor sheets will be in various formats (Excel, PDF, CSV) — need flexible import
- Half-yard SKUs (future, see todo P1) will need cost mapped from full-yard base
Options Considered
Option A: Shopify API Pull Only
Approach: Fetch existing cost data from Shopify's InventoryItem API for all variants. Whatever Cole has already entered in Shopify Products → Cost per item populates immediately. Pros: Zero manual data entry if already populated; syncs automatically going forward; clean source of truth Cons: Most Shopify stores have <30% cost coverage; may get sparse data; no landed cost breakdown Estimated Effort: Low (2-3 hours to build)
Option B: Vendor Sheet Import Only
Approach: Request pricing sheets from all vendors, build a CSV/Excel importer that maps vendor SKUs to Shopify SKUs and writes to DB. Pros: Complete coverage; includes landed cost components (freight, duties) Cons: Requires vendor coordination (days/weeks); manual ongoing maintenance; vendor SKU → Shopify SKU mapping is fragile Estimated Effort: High (1-2 weeks including vendor turnaround)
Option C: Hybrid — API Pull First, Vendor Sheet Fill-Gaps (CHOSEN)
Approach: Phase 1 pulls whatever's already in Shopify via API. Phase 2 builds the schema + import pipeline. Phase 3 requests vendor sheets for top vendors. As sheets arrive, import fills gaps left by API. Shopify remains the system of record — vendor sheets update Shopify costs via API write-back. Pros: Gets data immediately; vendor sheets add landed cost detail; single source of truth (Shopify); scales naturally Cons: Slightly more complex; requires API write-back in later phase Estimated Effort: Medium (4-6 hours initial build + vendor coordination ongoing)
Chosen Approach
Decision: Option C — Hybrid API Pull + Vendor Sheet Import
Rationale:
- Shopify's native cost field is the right system of record — avoids a parallel cost database that drifts
- API pull surfaces what's already there with zero vendor coordination
- Landed cost (freight + duties) requires vendor sheets regardless — Option A alone is insufficient for true margin
- Writing costs back to Shopify means all downstream tools (Shopify reports, future BI) benefit automatically
Trade-offs Accepted:
- Shopify cost field is unit cost only; freight/duty % stored separately in a local
cost_adjustmentstable - Full coverage will take 2-4 weeks pending vendor responses
Implementation Plan
Phase 1: POC — Pull Existing Costs from Shopify API (Top 100 SKUs)
Goal: Discover how much cost data already exists in Shopify; create the foundation schema.
- ☐ Query SQLite to identify top 100 SKUs by revenue (join line_items → variants, group by SKU, sum qty×price)
- ☐ For each of the top 100 SKUs, fetch
inventory_item_idfrom variants table - ☐ Call Shopify REST API
GET /admin/api/2024-01/inventory_items/{id}.jsonfor each item (FO store) - ☐ Log: SKU, inventory_item_id, cost (if set), created_at, updated_at
- ☐ Report: What % of top 100 have cost populated? What's the estimated gross margin at current prices?
- ☐ Script:
~/ai-projects-local/shopify-bigquery-pipeline/cost_poc.py
Output: data/cost_poc_results.csv + console summary showing coverage rate
Phase 2: Schema Extension
Goal: Add cost tracking to SQLite DB without breaking existing pipeline.
- ☐ Add
product_coststable to DB (separate from variants — avoids breaking existing schema):
CREATE TABLE IF NOT EXISTS product_costs (
sku TEXT NOT NULL,
store TEXT NOT NULL,
inventory_item_id INTEGER,
unit_cost REAL, -- wholesale cost from Shopify/vendor sheet
freight_pct REAL DEFAULT 0.05, -- estimated freight as % of unit cost
duty_pct REAL DEFAULT 0.0, -- import duty % (0 for domestic, ~0.125 for China)
landed_cost REAL GENERATED ALWAYS AS
(unit_cost * (1 + freight_pct + duty_pct)) STORED,
cost_source TEXT, -- 'shopify_api', 'vendor_sheet', 'manual'
vendor_sku TEXT, -- vendor's own SKU reference
effective_date TEXT, -- when this cost became effective
notes TEXT,
updated_at TEXT,
PRIMARY KEY (sku, store)
);
CREATE INDEX IF NOT EXISTS idx_product_costs_sku ON product_costs(sku);
CREATE INDEX IF NOT EXISTS idx_product_costs_store ON product_costs(store);
- ☐ Add a
margin_viewfor quick analysis:
CREATE VIEW IF NOT EXISTS sku_margin AS
SELECT
v.sku,
v.store,
p.title,
p.vendor,
v.price as retail_price,
pc.unit_cost,
pc.landed_cost,
pc.freight_pct,
pc.duty_pct,
ROUND((v.price - pc.landed_cost) / v.price * 100, 1) as gross_margin_pct,
ROUND(v.price - pc.landed_cost, 2) as gross_margin_dollars,
pc.cost_source,
pc.updated_at
FROM variants v
JOIN products p ON v.product_id = p.id
LEFT JOIN product_costs pc ON v.sku = pc.sku AND v.store = pc.store
WHERE v.sku != '';
- ☐ Add migration to
shopify_sqlite.pyinit routine so DB auto-upgrades on next sync - ☐ Script to apply schema:
~/ai-projects-local/shopify-bigquery-pipeline/migrate_add_costs.py
Phase 3: Vendor Outreach — Pricing Sheet Requests
Goal: Get wholesale pricing sheets from top 5 vendors (covers ~85% of inventory value).
Top 5 vendors by FO retail inventory value:
- Riley Blake Designs — $67K retail value, 10,671 variant SKUs
- FreeSpirit Fabrics — $15K retail value, 672 SKUs
- Andover Fabrics — $13K retail value, 1,966 SKUs
- Fabric Outlet (house brand) — $12K retail value, 119 SKUs (Cole has direct cost data)
- Checker Distributors — $5K retail value, 62 SKUs
Ask from each vendor:
- Current wholesale price list (CSV or Excel preferred)
- Include: Vendor SKU, Product Name, Wholesale Price, MSRP
- Country of origin (for duty calculation)
- Freight terms (FOB, CIF, or flat rate?)
Draft email template (for Tibby to send from vendor accounts):
Subject: Wholesale Price List Request — [Vendor Name]
Hi [Contact Name],
We're updating our cost tracking system and need a current copy of your wholesale
price list for our records.
Could you send us a CSV or Excel file with:
- Item/SKU numbers
- Wholesale prices
- MSRP/suggested retail
- Country of origin (for import records)
We buy under account [ACCOUNT NUMBER] — please let us know if you need anything
else to pull our specific pricing tier.
Thanks,
[Cole / Tibby]
- ☐ Create email drafts for top 5 vendors (Tibby sends)
- ☐ Track responses in
data/vendor_pricing_log.csv(vendor, date requested, date received, format, SKU coverage %)
Phase 4: Vendor Sheet Import Pipeline
Goal: Parse incoming vendor sheets and write costs to product_costs table.
- ☐ Build
cost_importer.py— supports CSV and Excel input - ☐ SKU matching logic:
- Exact match: vendor SKU = Shopify SKU (best case)
- Fuzzy match: strip vendor prefix/suffix, match on product handle
- Manual override:
data/sku_mapping_overrides.csvfor exceptions - ☐ Validation rules:
- Reject if unit_cost > price (would imply negative margin — flag for review)
- Warn if margin < 20% (below typical fabric target)
- Warn if margin > 80% (possible data entry error)
- ☐ Write matched rows to
product_costswithcost_source = 'vendor_sheet' - ☐ Unmatched rows logged to
data/cost_import_unmatched.csvfor manual resolution - ☐ Script:
~/ai-projects-local/shopify-bigquery-pipeline/cost_importer.py
Usage:
python cost_importer.py --vendor "Riley Blake Designs" --file ~/Downloads/rbd_pricelist_2026.xlsx --store fo --freight-pct 0.04 --duty-pct 0.0
Phase 5: Margin Analysis Report
Goal: Generate actionable margin report once sufficient cost data exists (>50 of top 100 SKUs covered).
- ☐ Build
margin_report.py— queriessku_marginview, outputs to Google Sheets or CSV - ☐ Report columns: SKU, Product, Vendor, Retail Price, Unit Cost, Landed Cost, Margin %, Margin $, Units in Stock, Inventory at Cost, Inventory at Retail
- ☐ Segments: by vendor, by product type, by margin tier (<20%, 20-40%, 40-60%, >60%)
- ☐ Flag: SKUs selling below landed cost (loss leaders vs errors)
- ☐ Flag: High-volume SKUs with unknown cost (prioritize for next vendor outreach)
- ☐ Script:
~/ai-projects-local/shopify-bigquery-pipeline/margin_report.py
Acceptance Criteria
Must Have:
- ☐
product_coststable created and integrated with existing DB - ☐ POC script fetches costs for top 100 SKUs from Shopify API, reports coverage %
- ☐
sku_marginview returns correct margin calculations for SKUs with cost data - ☐ Vendor outreach email drafts ready for Tibby to send
Should Have:
- ☐
cost_importer.pyhandles Riley Blake Designs price sheet format - ☐ Margin report showing top 100 SKUs with current margin data
Nice to Have:
- ☐ Write-back: Update Shopify inventory item costs via API after import (keeps Shopify as source of truth)
- ☐ Shopify API sync integrated into main pipeline so costs refresh automatically
Verification Steps
- Run
cost_poc.py— verify it returns results for at least 1 SKU withcostpopulated - Run
migrate_add_costs.py— verifyproduct_coststable andsku_marginview exist in DB - Query:
SELECT * FROM sku_margin WHERE unit_cost IS NOT NULL LIMIT 10;— confirm margin math is correct - Manually verify 3 SKUs: look up their Shopify admin → Inventory → Cost per item and confirm DB matches
Execution Log
2026-03-31
- Plan created. Research phase complete.
- Key finding: Shopify's
inventory_item_idcolumn already in variants table — direct API bridge exists. - Key finding: 2,244 active FO SKUs across 22 vendors; top 5 vendors = ~85% of retail inventory value.
- Built all three core scripts (syntax-verified clean):
cost_poc.py— Phase 1 POC: queries top 100 SKUs, fetches costs from Shopify API, reports coverage %migrate_add_costs.py— Phase 2 migration: addsproduct_coststable +sku_marginview to DBcost_importer.py— Phase 4 importer: parses vendor CSV/Excel sheets, matches SKUs, upserts to DB- Dry-run tested
migrate_add_costs.pyagainst live DB — SQL is valid, DB has 7 tables + 0 views currently. - Next step for Cole: Run
python3 migrate_add_costs.pyto apply schema, thenpython3 cost_poc.pyto check coverage.
Follow-Up Tasks
After POC coverage rate is known:
- If >40% coverage: Proceed directly to Phase 2 schema + Phase 5 report (sufficient for margin analysis)
- If <40% coverage: Prioritize Phase 3 vendor outreach before building full pipeline
- Add Shiplap + Sundance stores after FO POC is validated
References
- Shopify InventoryItem API: https://shopify.dev/docs/api/admin-rest/2024-01/resources/inventoryitem
- Shopify Bulk GraphQL for cost:
inventoryItems { edges { node { unitCost { amount } } } } - HTS code for quilting fabric (cotton): 5208.xx — US duty rate varies 0-14.9% depending on weave
- Existing pipeline:
~/ai-projects-local/shopify-bigquery-pipeline/ - Vendor inventory values: queried from
shopify.dbon 2026-03-31
~/ai-projects/mission-control/plans/sku-cost-tracking.md