← Back to all projects
In Progress Created 2026-03-31 0/32 tasks

SKU-Level Cost Tracking Implementation

Priority: P2 Category: PREP


Executive Summary

Add product cost data to the Shopify SQLite database to enable true gross margin analysis. The existing pipeline tracks revenue (price, compare_at_price) but has zero cost visibility. Shopify natively supports cost-per-item on inventory items — we can pull existing cost data via API immediately, then build a vendor sheet import pipeline to fill gaps. POC covers the top 100 SKUs by revenue volume.


Research Phase

Current State

Database: ~/ai-projects-local/shopify-bigquery-pipeline/data/shopify.db

  • variants table: has price, compare_at_price, inventory_item_idno cost column
  • line_items table: has price, total_discountno cost column
  • products table: has vendor — useful for vendor-level cost assumptions

FO Store Inventory Snapshot (as of last sync):

  • 2,244 active SKUs across 22 vendors
  • $121,007 total retail inventory value
  • Top vendors by retail value: Riley Blake Designs ($67K), FreeSpirit ($15K), Andover ($13K)

Shopify API Capability:

  • Shopify stores cost (wholesale/unit cost) on the InventoryItem object
  • Already captured inventory_item_id on every variant row — this is the join key
  • REST endpoint: GET /admin/api/2024-01/inventory_items/{id}.jsoncost field
  • GraphQL: inventoryItem { unitCost { amount } } — supports bulk operations
  • Cost field is optional — many merchants leave it blank; we'll discover coverage rate in Phase 1

Landed Cost Components:

  • Wholesale (unit cost from vendor invoice)
  • Inbound freight (typically 3-8% of wholesale for fabric, varies by vendor/carrier)
  • Import duties (varies by country of origin; most fabric from China at ~12.5% HTS 5208.xx)
  • Payment processing is usually excluded from COGS (considered operating expense)

Constraints

  • No cost data currently in DB — must either pull from Shopify API or import from vendor sheets
  • Top 100 SKUs by what metric? Using revenue (units sold × price) from line_items is most meaningful, but if line_items history is shallow, use inventory value (qty × price) as proxy
  • Vendor sheets will be in various formats (Excel, PDF, CSV) — need flexible import
  • Half-yard SKUs (future, see todo P1) will need cost mapped from full-yard base

Options Considered

Option A: Shopify API Pull Only

Approach: Fetch existing cost data from Shopify's InventoryItem API for all variants. Whatever Cole has already entered in Shopify Products → Cost per item populates immediately. Pros: Zero manual data entry if already populated; syncs automatically going forward; clean source of truth Cons: Most Shopify stores have <30% cost coverage; may get sparse data; no landed cost breakdown Estimated Effort: Low (2-3 hours to build)

Option B: Vendor Sheet Import Only

Approach: Request pricing sheets from all vendors, build a CSV/Excel importer that maps vendor SKUs to Shopify SKUs and writes to DB. Pros: Complete coverage; includes landed cost components (freight, duties) Cons: Requires vendor coordination (days/weeks); manual ongoing maintenance; vendor SKU → Shopify SKU mapping is fragile Estimated Effort: High (1-2 weeks including vendor turnaround)

Option C: Hybrid — API Pull First, Vendor Sheet Fill-Gaps (CHOSEN)

Approach: Phase 1 pulls whatever's already in Shopify via API. Phase 2 builds the schema + import pipeline. Phase 3 requests vendor sheets for top vendors. As sheets arrive, import fills gaps left by API. Shopify remains the system of record — vendor sheets update Shopify costs via API write-back. Pros: Gets data immediately; vendor sheets add landed cost detail; single source of truth (Shopify); scales naturally Cons: Slightly more complex; requires API write-back in later phase Estimated Effort: Medium (4-6 hours initial build + vendor coordination ongoing)


Chosen Approach

Decision: Option C — Hybrid API Pull + Vendor Sheet Import

Rationale:

  1. Shopify's native cost field is the right system of record — avoids a parallel cost database that drifts
  2. API pull surfaces what's already there with zero vendor coordination
  3. Landed cost (freight + duties) requires vendor sheets regardless — Option A alone is insufficient for true margin
  4. Writing costs back to Shopify means all downstream tools (Shopify reports, future BI) benefit automatically

Trade-offs Accepted:

  • Shopify cost field is unit cost only; freight/duty % stored separately in a local cost_adjustments table
  • Full coverage will take 2-4 weeks pending vendor responses

Implementation Plan

Phase 1: POC — Pull Existing Costs from Shopify API (Top 100 SKUs)

Goal: Discover how much cost data already exists in Shopify; create the foundation schema.

  • Query SQLite to identify top 100 SKUs by revenue (join line_items → variants, group by SKU, sum qty×price)
  • For each of the top 100 SKUs, fetch inventory_item_id from variants table
  • Call Shopify REST API GET /admin/api/2024-01/inventory_items/{id}.json for each item (FO store)
  • Log: SKU, inventory_item_id, cost (if set), created_at, updated_at
  • Report: What % of top 100 have cost populated? What's the estimated gross margin at current prices?
  • Script: ~/ai-projects-local/shopify-bigquery-pipeline/cost_poc.py

Output: data/cost_poc_results.csv + console summary showing coverage rate


Phase 2: Schema Extension

Goal: Add cost tracking to SQLite DB without breaking existing pipeline.

  • Add product_costs table to DB (separate from variants — avoids breaking existing schema):
CREATE TABLE IF NOT EXISTS product_costs (
    sku TEXT NOT NULL,
    store TEXT NOT NULL,
    inventory_item_id INTEGER,
    unit_cost REAL,                    -- wholesale cost from Shopify/vendor sheet
    freight_pct REAL DEFAULT 0.05,     -- estimated freight as % of unit cost
    duty_pct REAL DEFAULT 0.0,         -- import duty % (0 for domestic, ~0.125 for China)
    landed_cost REAL GENERATED ALWAYS AS
        (unit_cost * (1 + freight_pct + duty_pct)) STORED,
    cost_source TEXT,                  -- 'shopify_api', 'vendor_sheet', 'manual'
    vendor_sku TEXT,                   -- vendor's own SKU reference
    effective_date TEXT,               -- when this cost became effective
    notes TEXT,
    updated_at TEXT,
    PRIMARY KEY (sku, store)
);

CREATE INDEX IF NOT EXISTS idx_product_costs_sku ON product_costs(sku);
CREATE INDEX IF NOT EXISTS idx_product_costs_store ON product_costs(store);
  • Add a margin_view for quick analysis:
CREATE VIEW IF NOT EXISTS sku_margin AS
SELECT
    v.sku,
    v.store,
    p.title,
    p.vendor,
    v.price as retail_price,
    pc.unit_cost,
    pc.landed_cost,
    pc.freight_pct,
    pc.duty_pct,
    ROUND((v.price - pc.landed_cost) / v.price * 100, 1) as gross_margin_pct,
    ROUND(v.price - pc.landed_cost, 2) as gross_margin_dollars,
    pc.cost_source,
    pc.updated_at
FROM variants v
JOIN products p ON v.product_id = p.id
LEFT JOIN product_costs pc ON v.sku = pc.sku AND v.store = pc.store
WHERE v.sku != '';
  • Add migration to shopify_sqlite.py init routine so DB auto-upgrades on next sync
  • Script to apply schema: ~/ai-projects-local/shopify-bigquery-pipeline/migrate_add_costs.py

Phase 3: Vendor Outreach — Pricing Sheet Requests

Goal: Get wholesale pricing sheets from top 5 vendors (covers ~85% of inventory value).

Top 5 vendors by FO retail inventory value:

  1. Riley Blake Designs — $67K retail value, 10,671 variant SKUs
  2. FreeSpirit Fabrics — $15K retail value, 672 SKUs
  3. Andover Fabrics — $13K retail value, 1,966 SKUs
  4. Fabric Outlet (house brand) — $12K retail value, 119 SKUs (Cole has direct cost data)
  5. Checker Distributors — $5K retail value, 62 SKUs

Ask from each vendor:

  • Current wholesale price list (CSV or Excel preferred)
  • Include: Vendor SKU, Product Name, Wholesale Price, MSRP
  • Country of origin (for duty calculation)
  • Freight terms (FOB, CIF, or flat rate?)

Draft email template (for Tibby to send from vendor accounts):

Subject: Wholesale Price List Request — [Vendor Name]

Hi [Contact Name],

We're updating our cost tracking system and need a current copy of your wholesale
price list for our records.

Could you send us a CSV or Excel file with:

- Item/SKU numbers
- Wholesale prices
- MSRP/suggested retail
- Country of origin (for import records)

We buy under account [ACCOUNT NUMBER] — please let us know if you need anything
else to pull our specific pricing tier.

Thanks,
[Cole / Tibby]
  • Create email drafts for top 5 vendors (Tibby sends)
  • Track responses in data/vendor_pricing_log.csv (vendor, date requested, date received, format, SKU coverage %)

Phase 4: Vendor Sheet Import Pipeline

Goal: Parse incoming vendor sheets and write costs to product_costs table.

  • Build cost_importer.py — supports CSV and Excel input
  • SKU matching logic:
  • Exact match: vendor SKU = Shopify SKU (best case)
  • Fuzzy match: strip vendor prefix/suffix, match on product handle
  • Manual override: data/sku_mapping_overrides.csv for exceptions
  • Validation rules:
  • Reject if unit_cost > price (would imply negative margin — flag for review)
  • Warn if margin < 20% (below typical fabric target)
  • Warn if margin > 80% (possible data entry error)
  • Write matched rows to product_costs with cost_source = 'vendor_sheet'
  • Unmatched rows logged to data/cost_import_unmatched.csv for manual resolution
  • Script: ~/ai-projects-local/shopify-bigquery-pipeline/cost_importer.py

Usage:

python cost_importer.py --vendor "Riley Blake Designs" --file ~/Downloads/rbd_pricelist_2026.xlsx --store fo --freight-pct 0.04 --duty-pct 0.0

Phase 5: Margin Analysis Report

Goal: Generate actionable margin report once sufficient cost data exists (>50 of top 100 SKUs covered).

  • Build margin_report.py — queries sku_margin view, outputs to Google Sheets or CSV
  • Report columns: SKU, Product, Vendor, Retail Price, Unit Cost, Landed Cost, Margin %, Margin $, Units in Stock, Inventory at Cost, Inventory at Retail
  • Segments: by vendor, by product type, by margin tier (<20%, 20-40%, 40-60%, >60%)
  • Flag: SKUs selling below landed cost (loss leaders vs errors)
  • Flag: High-volume SKUs with unknown cost (prioritize for next vendor outreach)
  • Script: ~/ai-projects-local/shopify-bigquery-pipeline/margin_report.py

Acceptance Criteria

Must Have:

  • product_costs table created and integrated with existing DB
  • POC script fetches costs for top 100 SKUs from Shopify API, reports coverage %
  • sku_margin view returns correct margin calculations for SKUs with cost data
  • Vendor outreach email drafts ready for Tibby to send

Should Have:

  • cost_importer.py handles Riley Blake Designs price sheet format
  • Margin report showing top 100 SKUs with current margin data

Nice to Have:

  • Write-back: Update Shopify inventory item costs via API after import (keeps Shopify as source of truth)
  • Shopify API sync integrated into main pipeline so costs refresh automatically

Verification Steps

  1. Run cost_poc.py — verify it returns results for at least 1 SKU with cost populated
  2. Run migrate_add_costs.py — verify product_costs table and sku_margin view exist in DB
  3. Query: SELECT * FROM sku_margin WHERE unit_cost IS NOT NULL LIMIT 10; — confirm margin math is correct
  4. Manually verify 3 SKUs: look up their Shopify admin → Inventory → Cost per item and confirm DB matches

Execution Log

2026-03-31

  • Plan created. Research phase complete.
  • Key finding: Shopify's inventory_item_id column already in variants table — direct API bridge exists.
  • Key finding: 2,244 active FO SKUs across 22 vendors; top 5 vendors = ~85% of retail inventory value.
  • Built all three core scripts (syntax-verified clean):
  • cost_poc.py — Phase 1 POC: queries top 100 SKUs, fetches costs from Shopify API, reports coverage %
  • migrate_add_costs.py — Phase 2 migration: adds product_costs table + sku_margin view to DB
  • cost_importer.py — Phase 4 importer: parses vendor CSV/Excel sheets, matches SKUs, upserts to DB
  • Dry-run tested migrate_add_costs.py against live DB — SQL is valid, DB has 7 tables + 0 views currently.
  • Next step for Cole: Run python3 migrate_add_costs.py to apply schema, then python3 cost_poc.py to check coverage.

Follow-Up Tasks

After POC coverage rate is known:

  • If >40% coverage: Proceed directly to Phase 2 schema + Phase 5 report (sufficient for margin analysis)
  • If <40% coverage: Prioritize Phase 3 vendor outreach before building full pipeline
  • Add Shiplap + Sundance stores after FO POC is validated

References

  • Shopify InventoryItem API: https://shopify.dev/docs/api/admin-rest/2024-01/resources/inventoryitem
  • Shopify Bulk GraphQL for cost: inventoryItems { edges { node { unitCost { amount } } } }
  • HTS code for quilting fabric (cotton): 5208.xx — US duty rate varies 0-14.9% depending on weave
  • Existing pipeline: ~/ai-projects-local/shopify-bigquery-pipeline/
  • Vendor inventory values: queried from shopify.db on 2026-03-31