How We Built Profit-Aware Cart Recovery on Shopify [Architecture Deep Dive]

This is the engineering companion to The $14 Problem — our merchant-facing manifesto on profit-aware cart recovery. If you haven't read it, start there for the "why." This post covers the "how."

What follows is the architectural journal of building Reclavio: what we tried, what broke, and the system we shipped. If you're building Shopify storefront infrastructure, LLM-powered commerce tools, or offer management systems, this is for you.

Author: Brodie, Founder @ Reclavio Status: Private beta (pre–Shopify App Store) Last updated: February 2026

The Architecture: Observer → Decider → Deliverer

The system is designed around three layers. Each layer has one job. No layer "cheats."

Glossary:

App Proxy: A Shopify feature that forwards storefront requests to your app

HMAC: A cryptographic signature that proves Shopify sent the request

Idempotency: Same input → same outcome (prevents discount code farming)

The Observer → Decider → Deliverer architecture: each layer has one job, no cheating.

Layer 1: The Observer

Understand intent without collecting personal data.

Before any decision is made, we gather context—no PII required, no third-party cookies, no tracking dark patterns.

We observe:

Cart composition: Total value (in minor units), item count, product categories
Page context: Product page, cart page, collection, checkout
Customer signals: Logged-in status (from Shopify's signed logged_in_customer_id), session depth
Real-time behavior: Time on page, scroll depth, exit intent

This creates a rich decision context without compromising shopper privacy.

Layer 2: The Decider

Policy is code. Conversation is AI.

This is where Reclavio differs fundamentally from "just add an LLM" solutions.

The eligibility engine is deterministic and always authoritative.

When a shopper interacts with Reclavio, the system routes their message through a multi-layer classifier (keyword matching → weighted scoring → ML intent classification):

Route	Description	Can Mint Discounts?
`NEGOTIATION`	Active discount negotiation	✅ Yes (if eligible)
`POLICY`	Return/shipping questions	❌ No
`PRODUCT_DISCOVERY`	Product search, recommendations	❌ No
`SUPPORT`	Human escalation	❌ No
`B2B`	Wholesale/bulk inquiry	❌ No
`MIXED_INTENT`	Combined policy + negotiation	❌ No (answers policy first)
`SAFE_FALLBACK`	Low-confidence or unknown intent	❌ No

Only the NEGOTIATION route can issue discounts—and even then, only if:

The merchant's rules permit it (cart value thresholds, customer type, product eligibility)
The offer hasn't already been issued (idempotency)
The offer doesn't exceed configured caps (decision enforcement)

Design principle: AI writes the sentence. Deterministic code writes the policy.

Layer 3: The Deliverer

Speed when available. Reliability always.

Getting a response to the shopper sounds simple—until you encounter Shopify's platform constraints.

The Problem: Storefront proxy calls have tight response budgets. App proxies don't support cookies—Shopify strips Cookie from requests and Set-Cookie from responses for security. (Shopify App Proxy Docs)

The Solution: A dual-lane architecture.

Streaming when available, proxy-safe fallback always—shoppers always get an answer.

Lane A (App Proxy): Reliable but constrained. It always works, but tight timeout windows and synchronous nature prevent real-time token streaming.

Lane B (Direct Streaming): Feels like magic—tokens appear as they're generated in a ChatGPT-like experience. Requires a streamGrant JWT, which means an extra bootstrap step that can fail.

The widget tries Lane B first for premium UX. If the grant isn't available or the stream fails, it seamlessly falls back to Lane A. If Lane A times out, it serves a template response based on detected intent.

Designed so shoppers always receive a response, including template fallback under timeout conditions.

Why This Doesn't Break in Production

Building a demo is easy. Building a system that handles real-world constraints, with real money on the line—that's different.

1. Deterministic Eligibility as Source of Truth

The most critical architectural decision: LLM timeouts should never change the offer outcome.

In early iterations, I hit a failure mode: if the LLM timed out during message generation, the system would skip the offer—even though the shopper was eligible. Identical shoppers got different outcomes based on network latency.

The fix was architectural: eligibility is computed before the LLM is called. If you're eligible, you get the offer. The LLM shapes the delivery, not the decision.

💡 If you remember one thing: Eligibility must be deterministic before the LLM runs. Timeouts should only affect messaging, never outcomes.

End-to-End Offer Flow

End-to-end offer flow: eligibility is computed before calling the LLM, ensuring deterministic outcomes even on timeout.

sequenceDiagram
    participant Widget
    participant AppProxy as App Proxy
    participant RuleEngine as Rule Engine
    participant OfferLedger as Offer Ledger
    participant LLM
    
    Widget->>AppProxy: POST /decision (cartValue, sessionId, signature)
    AppProxy->>AppProxy: Verify Shopify signature
    AppProxy->>RuleEngine: computeEligibility(cart, rules)
    RuleEngine-->>AppProxy: {eligible: true, maxDiscount: 10%}
    
    alt Eligible for offer
        AppProxy->>OfferLedger: reserveOffer(sessionId, cartToken)
        OfferLedger-->>AppProxy: {discountCode: "SAVE10", idempotent: true}
    end
    
    AppProxy->>LLM: generateResponse(eligibility, context)
    
    alt LLM responds in time
        LLM-->>AppProxy: "Great news! I can offer you 10% off..."
        AppProxy-->>Widget: {decision, discountCode, offerCard}
    else LLM timeout
        AppProxy-->>Widget: {templateResponse, discountCode, offerCard}
        Note over Widget: Offer still delivered (deterministic)
    end

The key: eligibility and offer reservation happen before the LLM call. Timeouts only affect messaging.

2. Idempotent Offer Issuance (Anti-Farming Controls)

Without idempotency, a clever shopper could refresh 10 times and get 10 different codes, share the widget link, or build bots to harvest codes at scale.

The Offer Ledger ensures:

One active offer per cart/session combination per rule
Discount codes are reserved before they're displayed
Subsequent requests return the same code, not a new one
TTL management supports polling and reuse

3. Observability as a Feature

You can't improve what you can't measure.

Correlation IDs: Every request, from widget to backend to LLM, carries a x-correlation-id header for end-to-end tracing
Trace Context: Correlation ID headers propagate through API and LLM calls for distributed tracing

Launch SLO Targets:

Metric	Target	Purpose
p95 Lane A Latency	<3,000ms	Primary path performance
p95 Lane B Latency	<1,000ms	Streaming path performance
Timeout Rate	<2%	System overwhelm indicator
Fallback Rate	<5%	Template fallback frequency
Offer Mismatch Rate	0%	Eligibility ↔ delivery consistency

Shopify Platform Constraints (Lessons Learned)

After months of development and staging testing, here's what I learned about building serious Shopify infrastructure.

App Proxy Realities

Shopify's App Proxy is powerful but constrained:

Constraint	Reality	Solution
Tight timeouts	Limited response window	Deadline budgeting: 6.5s for LLM, 2s for network, 1.5s buffer
No cookie sessions	`Cookie`/`Set-Cookie` stripped from responses	Explicit `sessionId` in request body; `cartToken` for cart correlation
Header stripping	Disallowed headers removed	Body-level `correlationId` for tracing
Signature verification	Shopify signs requests with `signature` param	Verify HMAC before processing; partition read vs. write endpoints

💡 Key insight: Always verify Shopify's proxy signature (shop, path_prefix, timestamp, signature). Treat all response headers as potentially stripped. (Shopify App Proxy Auth)

Webhook Correctness

Shopify's webhook system considers any 2xx response as success; non-2xx triggers retries—a total of 8 attempts over ~4 hours with exponential backoff (updated Sept 2024). Shopify expects a response in under 5 seconds. (Shopify Webhook Retry Update, Shopify Webhook Troubleshooting)

The "Semantic 200" Pattern

Scenario	Status Code	Result
Success	200	Shopify stops retrying
Auth failure (bad HMAC)	401/403	Shopify stops retrying
Malformed payload (non-recoverable)	200 + structured log + alert	Shopify stops retrying
Transient error	500	Shopify retries (8x over ~4h)

The counterintuitive insight: if the payload is permanently unrecoverable, acknowledge it to stop retries, log the failure, alert, and backfill via API. Returning 500 for an unrecoverable payload wastes capacity and consumes Shopify's limited retry window. (Shopify Webhook Troubleshooting)

Key webhook patterns:

Respond within 5 seconds; offload slow work to a queue
Use webhook ID for idempotency (24-hour TTL deduplication)
HMAC verification before any processing

Timeouts and Tail Latency

The 99th percentile will hurt you. Design for the worst case:

Hard deadline: Backend enforces 6.5s limit on LLM calls
Template fallbacks: If deadline breaches, serve a pre-written response based on detected intent
Graceful degradation: If intent is unknown, trigger SAFE_FALLBACK

What I Tried and Rejected

I tested the "obvious" approaches. They all broke in predictable ways.

Approach	Why It Failed	What Replaced It
Let the LLM decide eligibility	Non-deterministic outcomes under latency	Deterministic rule engine computes eligibility before LLM runs
Session via cookies through App Proxy	Not supported—Shopify strips cookie headers (Shopify App Proxy)	Explicit session identifiers via body params; auth via HMAC
Synchronous webhook processing	5-second timeout violations caused retry storms	Queue-first pattern: ack immediately, process async (Shopify Webhooks)
Single delivery lane	Proxy timeouts blocked streaming; streaming failures blocked any response	Dual-lane architecture with graceful fallback hierarchy

💡 Shopify's platform constraints aren't bugs—they're security features. Design around them, not against them.

IPOE: Incremental Profit Offer Engine

Traditional A/B testing tells you "Variant B converts 5% better." But it doesn't answer the question that actually matters: "Did those extra conversions come from customers who would have bought anyway?"

If you discount someone who would've purchased anyway, you didn't "recover revenue"—you paid margin for nothing.

IPOE uses causal inference to measure incremental impact—the lift that wouldn't have happened without intervention.

A small % of sessions are holdout (no discount offers)
The rest are treatment (Reclavio engages + offers when rules allow)
The difference estimates incremental lift
Discounts are scored by Expected Incremental Profit (EIP), not raw conversion

The IPOE decision flow: from holdout assignment through guardrail enforcement, EIP scoring, and final action selection.

CUPED Variance Reduction

We use CUPED (Controlled-experiment Using Pre-Experiment Data) to reduce noise in lift estimates. By adjusting for pre-intervention cart behavior, we can detect smaller effects with fewer samples—meaning faster statistical significance with less traffic.

The Safety Fuse: Circuit Breaker (G8)

Here's a scenario that kept me up at night: what if a bug in the bandit causes it to recommend discounts on every single session? The holdout group gets nothing, the treatment group gets 100% discount rate, and by the time you notice, you've bled margin for hours.

The answer is G8: Anomaly Detection — an automatic circuit breaker that monitors the treatment-only discount rate across a sliding window and trips a two-stage fuse if something looks wrong.

The Two-Stage Fuse

The circuit breaker evaluates the treatment discount rate — the fraction of non-holdout decisions that result in a discount action — using a sliding window of 5-minute buckets over the last 60 minutes.

Stage	Threshold	Behavior
Normal	Rate < 80%	System operates normally
Yellow	Rate ≥ 80%	Warning logged, metrics emitted — no action taken (detection mode)
Red	Rate ≥ 95%	Automatic pause: IPOE disabled for this merchant, all decisions return `action: 'none'`

The Yellow threshold exists for observability — it creates an audit trail and emits metrics before a hard pause. When Red trips, the system:

Sets a Redis flag (ipoe:broken:{merchantId}) so every pod sees the pause instantly
Writes pauseReason: 'CIRCUIT_BREAKER' to the merchant's IPOE state (DB persistence)
Logs a fleet-level deduplication record (one alert per merchant, not per pod)

Why a Sliding Window, Not a Global Counter

A naive "count total discounts" approach suffers from dilution — early healthy decisions mask a sudden spike. The sliding window (twelve 5-minute buckets, 60-minute trailing window) captures recent behavior without being noisy enough to trip on normal variance.

stateDiagram-v2
    [*] --> Normal
    Normal --> Yellow: treatmentDiscountRate ≥ 80%
    Yellow --> Red: treatmentDiscountRate ≥ 95%\n+ debounce (2 consecutive evaluations)
    Red --> Normal: Merchant resumes IPOE\n(reset clears all Redis state)
    Yellow --> Normal: Rate drops below 80%
    
    state Red {
        [*] --> Paused
        Paused: IPOE disabled
        Paused: All decisions → 'none'
        Paused: Dashboard shows safety banner
    }

Pause Priority: Not All Pauses Are Equal

A merchant can be paused for multiple reasons — billing overdue, manual admin pause, circuit breaker trip, or the merchant choosing to pause themselves. The circuit breaker respects a strict priority ordering:

BILLING_OVERDUE > ADMIN_PAUSE > CIRCUIT_BREAKER > USER_PAUSE

If a merchant is already paused for billing, the circuit breaker won't overwrite the pause reason. When the circuit breaker trips, it first checks: "Is there already a higher-priority pause in place?" If yes, it records the trip for observability but doesn't touch the database state.

This prevents a subtle bug: if the CB overwrites BILLING_OVERDUE with CIRCUIT_BREAKER, and the merchant resumes from the CB pause, they'd bypass the billing block entirely.

The Hot Path: Zero-RTT When Healthy

On every decision, the orchestrator calls isCircuitBroken(merchantId) before any holdout or bandit logic. This must be fast. The implementation uses a bounded in-memory cache (10,000 entries max, 30-second TTL) backed by a single Redis GET:

Cache hit (healthy): 0 Redis round-trips — pure in-memory
Cache miss: 1 Redis GET → cache result for 30 seconds
Cache hit (broken): Return true immediately — skip all IPOE logic

The 30-second TTL means a pause propagates to all pods within 30 seconds. For a safety mechanism, this is the right trade-off: fast enough to stop bleeding, without adding latency to every healthy decision.

Clean Reset Semantics

When a merchant resumes IPOE from a circuit breaker pause, the system performs a resetBreaker that deterministically clears all sliding window buckets and the broken flag — without using Redis KEYS() or SCAN(). This prevents the "Resume Loop" where stale bucket data causes an immediate re-trip on resume.

💡 Design principle: Safety mechanisms must be self-healing. A circuit breaker that requires manual Redis cleanup to resume isn't a safety feature — it's a footgun.

Third-Party Integrations

Reclavio syncs events to Klaviyo, Omnisend, or custom webhook endpoints. It doesn't replace your email/SMS stack—it makes it smarter by syncing in-session context.

One-click OAuth for Klaviyo and Omnisend; custom webhooks for everything else.

Supported Integrations

Platform	Auth Method	Event Types
Klaviyo	OAuth 2.0	Abandoned cart, offer displayed, offer accepted, conversion
Omnisend	API Key	Abandoned cart, offer displayed, offer accepted, conversion
Custom Webhook	HMAC Signature	All events (configurable)

Event Schema

{
  "eventType": "reclavio.offer.accepted",
  "timestamp": "2026-02-01T08:15:00Z",
  "merchantId": "shop_abc123",
  "sessionId": "session_xyz789",
  "cartValueCents": 15000,
  "discountOfferedPercent": 10,
  "discountUsed": true,
  "channel": "widget"
}

Security Architecture

HMAC Signatures: Every webhook signed with your secret key
SSRF Protection: Multi-layer defense (URL validation, protocol/port enforcement, DNS resolution, IP denylist, redirect blocking) prevents internal network attacks
Encrypted Storage: OAuth tokens encrypted with AES-256-GCM
Auto-Disable: Failing endpoints disabled after 5 consecutive failures
Dead Letter Queue: Failed events queued for retry with exponential backoff

💡 All webhook delivery is idempotent. You can safely retry or replay events without duplicate processing.

LLM Safety (OWASP LLM-aligned)

Controls align with the OWASP Top 10 for LLM Applications. (OWASP LLM Top 10)

This prevents the AI from being tricked into issuing unauthorized discounts or leaking merchant data.

Control	Implementation
Input Sanitization	Strip potential injection characters before LLM call
Output Cleansing	HTML stripping, URL allowlist verification
State Protection	Non-negotiation routes cryptographically blocked from discounts
Rate Limiting	Tiered per-session limits prevent abuse

Implementation Checklists

For Storefront Widgets

Handle proxy timeout constraints with deadline budgeting
Use session headers for continuity (cookies will be stripped); auth via proxy signature
Use body-level correlation IDs as fallback for stripped headers
Add a circuit breaker (disable widget after N consecutive errors)
Prevent duplicate request submission (send locks during latency)
Provide visible fallback UI for error states

For Offer Issuance Safety

Make eligibility deterministic and compute it before LLM calls
Implement idempotent offer generation (one offer per cart/session/rule)
Enforce discount caps in the decision layer, not just the LLM prompt
Add a circuit breaker for anomalous discount rates (auto-pause before margin bleeds)
Log every offer with correlation IDs for audit trails
Build anti-farming controls (rate limits, deduplication, TTL enforcement)

For LLM Integration in User-Facing Flows

Sanitize all LLM inputs (strip injection patterns)
Validate all LLM outputs (JSON schema enforcement)
Implement hard timeouts with fallback responses
Never let the LLM decide eligibility—only presentation
Log LLM calls with latency and token usage for cost management

What's Implemented (February 2026)

Reclavio is pre-launch / pre–Shopify App Store. These are the components implemented and in active testing:

✅ Shadow Mode: Evaluate rules on real traffic without affecting shoppers
✅ Opportunities (Shadow) Card: Dashboard visibility showing at-risk cart value
✅ Offer Ledger: Idempotent issuance (stable offer per cart/session/rule)
✅ Dual-lane delivery: Proxy-safe path + streaming path with fallback hierarchy
✅ Analytics foundations: Event model + SLO dashboards
✅ IPOE + Holdout Comparison Card: Configurable holdout (0–50%), 11-state comparison card, Wilson CI, Bayesian posterior, 7-day minimum runtime gate, SRM detection, export
✅ G8 Circuit Breaker: Two-stage anomaly detection fuse (Yellow warning at 80%, Red auto-pause at 95%) with sliding window, pause priority, bounded in-memory cache, and clean reset semantics
✅ Integrations module: Event schema + delivery pipeline (Klaviyo/Omnisend/custom webhook)
✅ Profit Protection Dashboard: "Why No Offer?" transparency logs with decision pipeline visualization
✅ SKU & Collection Exclusions: Fail-closed safeguards for protected products, collections, tags, vendors, product types, and gift cards

Next (planned before App Store submission):

Smart Upsell Before Discount — "Add $15 more for free shipping" (increases AOV with zero margin cost)
Time-Decay Offers — "This offer expires in 10 minutes" countdown urgency
CRO Quick Wins — "Get It By" date calculator, thumb-zone mobile bottom sheet, velocity social proof
Locale-Aware Routing — detect non-English locales and bypass keyword heuristics

Technical FAQ

How is holdout assignment deterministic? Deterministic hash of sessionId + date. The same session always lands in the same group within a day. No random coin flips.

What happens when I change the holdout percentage? Changing the holdout percentage starts a new measurement epoch. A confirmation dialog warns that existing comparison data resets. Historical data isn't deleted—it's a fresh measurement window. Mixing traffic from different holdout ratios would invalidate the causal inference.

Can I export holdout results? Yes. One-click CSV with holdout %, epoch start date, session counts, conversion rates, lift estimates, confidence intervals, and guardrail metrics.

What's CUPED and why does it matter? CUPED (Controlled-experiment Using Pre-Experiment Data) reduces variance in lift estimates by adjusting for pre-intervention cart behavior. You get statistically significant results faster, with less traffic.

How does the 7-day minimum runtime gate work? Even with large sample sizes, the verdict stays at "Underpowered" until 7 days pass. Early data often reflects selection bias (weekday/weekend, promotions, seasonal effects). The gate prevents false positives.

What's the SRM detection? Sample Ratio Mismatch detection flags when the actual treatment/holdout split deviates from the configured ratio—indicating a bug in assignment rather than a real treatment effect.

📖 Read the product story: The $14 Problem: Why Cart Recovery Tools Destroy Margin

Want early access? Join the waitlist →

References

How I verify claims: Shopify platform constraints are cited to official Shopify documentation (shopify.dev). Industry statistics are cited to Baymard Institute and Shopify publications. Product metrics are labeled as "pre-launch test results" or "targets."