How We Built Profit-Aware Cart Recovery on Shopify [Architecture Deep Dive]
This is the engineering companion to The $14 Problem — our merchant-facing manifesto on profit-aware cart recovery. If you haven't read it, start there for the "why." This post covers the "how."
What follows is the architectural journal of building Reclavio: what we tried, what broke, and the system we shipped. If you're building Shopify storefront infrastructure, LLM-powered commerce tools, or offer management systems, this is for you.
Author: Brodie, Founder @ Reclavio Status: Private beta (pre–Shopify App Store) Last updated: February 2026
The Architecture: Observer → Decider → Deliverer
The system is designed around three layers. Each layer has one job. No layer "cheats."
Glossary:
- App Proxy: A Shopify feature that forwards storefront requests to your app
- HMAC: A cryptographic signature that proves Shopify sent the request
- Idempotency: Same input → same outcome (prevents discount code farming)
The Observer → Decider → Deliverer architecture: each layer has one job, no cheating.
Layer 1: The Observer
Understand intent without collecting personal data.
Before any decision is made, we gather context—no PII required, no third-party cookies, no tracking dark patterns.
We observe:
- Cart composition: Total value (in minor units), item count, product categories
- Page context: Product page, cart page, collection, checkout
- Customer signals: Logged-in status (from Shopify's signed
logged_in_customer_id), session depth - Real-time behavior: Time on page, scroll depth, exit intent
This creates a rich decision context without compromising shopper privacy.
Layer 2: The Decider
Policy is code. Conversation is AI.
This is where Reclavio differs fundamentally from "just add an LLM" solutions.
The eligibility engine is deterministic and always authoritative.
When a shopper interacts with Reclavio, the system routes their message through a multi-layer classifier (keyword matching → weighted scoring → ML intent classification):
| Route | Description | Can Mint Discounts? |
|---|---|---|
NEGOTIATION | Active discount negotiation | ✅ Yes (if eligible) |
POLICY | Return/shipping questions | ❌ No |
PRODUCT_DISCOVERY | Product search, recommendations | ❌ No |
SUPPORT | Human escalation | ❌ No |
B2B | Wholesale/bulk inquiry | ❌ No |
MIXED_INTENT | Combined policy + negotiation | ❌ No (answers policy first) |
SAFE_FALLBACK | Low-confidence or unknown intent | ❌ No |
Only the NEGOTIATION route can issue discounts—and even then, only if:
- The merchant's rules permit it (cart value thresholds, customer type, product eligibility)
- The offer hasn't already been issued (idempotency)
- The offer doesn't exceed configured caps (decision enforcement)
Design principle: AI writes the sentence. Deterministic code writes the policy.
Layer 3: The Deliverer
Speed when available. Reliability always.
Getting a response to the shopper sounds simple—until you encounter Shopify's platform constraints.
The Problem: Storefront proxy calls have tight response budgets. App proxies don't support cookies—Shopify strips Cookie from requests and Set-Cookie from responses for security. (Shopify App Proxy Docs)
The Solution: A dual-lane architecture.
Streaming when available, proxy-safe fallback always—shoppers always get an answer.
Lane A (App Proxy): Reliable but constrained. It always works, but tight timeout windows and synchronous nature prevent real-time token streaming.
Lane B (Direct Streaming): Feels like magic—tokens appear as they're generated in a ChatGPT-like experience. Requires a streamGrant JWT, which means an extra bootstrap step that can fail.
The widget tries Lane B first for premium UX. If the grant isn't available or the stream fails, it seamlessly falls back to Lane A. If Lane A times out, it serves a template response based on detected intent.
Designed so shoppers always receive a response, including template fallback under timeout conditions.
Why This Doesn't Break in Production
Building a demo is easy. Building a system that handles real-world constraints, with real money on the line—that's different.
1. Deterministic Eligibility as Source of Truth
The most critical architectural decision: LLM timeouts should never change the offer outcome.
In early iterations, I hit a failure mode: if the LLM timed out during message generation, the system would skip the offer—even though the shopper was eligible. Identical shoppers got different outcomes based on network latency.
The fix was architectural: eligibility is computed before the LLM is called. If you're eligible, you get the offer. The LLM shapes the delivery, not the decision.
💡 If you remember one thing: Eligibility must be deterministic before the LLM runs. Timeouts should only affect messaging, never outcomes.
End-to-End Offer Flow
End-to-end offer flow: eligibility is computed before calling the LLM, ensuring deterministic outcomes even on timeout.
sequenceDiagram
participant Widget
participant AppProxy as App Proxy
participant RuleEngine as Rule Engine
participant OfferLedger as Offer Ledger
participant LLM
Widget->>AppProxy: POST /decision (cartValue, sessionId, signature)
AppProxy->>AppProxy: Verify Shopify signature
AppProxy->>RuleEngine: computeEligibility(cart, rules)
RuleEngine-->>AppProxy: {eligible: true, maxDiscount: 10%}
alt Eligible for offer
AppProxy->>OfferLedger: reserveOffer(sessionId, cartToken)
OfferLedger-->>AppProxy: {discountCode: "SAVE10", idempotent: true}
end
AppProxy->>LLM: generateResponse(eligibility, context)
alt LLM responds in time
LLM-->>AppProxy: "Great news! I can offer you 10% off..."
AppProxy-->>Widget: {decision, discountCode, offerCard}
else LLM timeout
AppProxy-->>Widget: {templateResponse, discountCode, offerCard}
Note over Widget: Offer still delivered (deterministic)
end
The key: eligibility and offer reservation happen before the LLM call. Timeouts only affect messaging.
2. Idempotent Offer Issuance (Anti-Farming Controls)
Without idempotency, a clever shopper could refresh 10 times and get 10 different codes, share the widget link, or build bots to harvest codes at scale.
The Offer Ledger ensures:
- One active offer per cart/session combination per rule
- Discount codes are reserved before they're displayed
- Subsequent requests return the same code, not a new one
- TTL management supports polling and reuse
3. Observability as a Feature
You can't improve what you can't measure.
- Correlation IDs: Every request, from widget to backend to LLM, carries a
x-correlation-idheader for end-to-end tracing - Trace Context: Correlation ID headers propagate through API and LLM calls for distributed tracing
Launch SLO Targets:
| Metric | Target | Purpose |
|---|---|---|
| p95 Lane A Latency | <3,000ms | Primary path performance |
| p95 Lane B Latency | <1,000ms | Streaming path performance |
| Timeout Rate | <2% | System overwhelm indicator |
| Fallback Rate | <5% | Template fallback frequency |
| Offer Mismatch Rate | 0% | Eligibility ↔ delivery consistency |
Shopify Platform Constraints (Lessons Learned)
After months of development and staging testing, here's what I learned about building serious Shopify infrastructure.
App Proxy Realities
Shopify's App Proxy is powerful but constrained:
| Constraint | Reality | Solution |
|---|---|---|
| Tight timeouts | Limited response window | Deadline budgeting: 6.5s for LLM, 2s for network, 1.5s buffer |
| No cookie sessions | Cookie/Set-Cookie stripped from responses | Explicit sessionId in request body; cartToken for cart correlation |
| Header stripping | Disallowed headers removed | Body-level correlationId for tracing |
| Signature verification | Shopify signs requests with signature param | Verify HMAC before processing; partition read vs. write endpoints |
💡 Key insight: Always verify Shopify's proxy signature (
shop,path_prefix,timestamp,signature). Treat all response headers as potentially stripped. (Shopify App Proxy Auth)
Webhook Correctness
Shopify's webhook system considers any 2xx response as success; non-2xx triggers retries—a total of 8 attempts over ~4 hours with exponential backoff (updated Sept 2024). Shopify expects a response in under 5 seconds. (Shopify Webhook Retry Update, Shopify Webhook Troubleshooting)
The "Semantic 200" Pattern
| Scenario | Status Code | Result |
|---|---|---|
| Success | 200 | Shopify stops retrying |
| Auth failure (bad HMAC) | 401/403 | Shopify stops retrying |
| Malformed payload (non-recoverable) | 200 + structured log + alert | Shopify stops retrying |
| Transient error | 500 | Shopify retries (8x over ~4h) |
The counterintuitive insight: if the payload is permanently unrecoverable, acknowledge it to stop retries, log the failure, alert, and backfill via API. Returning 500 for an unrecoverable payload wastes capacity and consumes Shopify's limited retry window. (Shopify Webhook Troubleshooting)
Key webhook patterns:
- Respond within 5 seconds; offload slow work to a queue
- Use webhook ID for idempotency (24-hour TTL deduplication)
- HMAC verification before any processing
Timeouts and Tail Latency
The 99th percentile will hurt you. Design for the worst case:
- Hard deadline: Backend enforces 6.5s limit on LLM calls
- Template fallbacks: If deadline breaches, serve a pre-written response based on detected intent
- Graceful degradation: If intent is unknown, trigger
SAFE_FALLBACK
What I Tried and Rejected
I tested the "obvious" approaches. They all broke in predictable ways.
| Approach | Why It Failed | What Replaced It |
|---|---|---|
| Let the LLM decide eligibility | Non-deterministic outcomes under latency | Deterministic rule engine computes eligibility before LLM runs |
| Session via cookies through App Proxy | Not supported—Shopify strips cookie headers (Shopify App Proxy) | Explicit session identifiers via body params; auth via HMAC |
| Synchronous webhook processing | 5-second timeout violations caused retry storms | Queue-first pattern: ack immediately, process async (Shopify Webhooks) |
| Single delivery lane | Proxy timeouts blocked streaming; streaming failures blocked any response | Dual-lane architecture with graceful fallback hierarchy |
💡 Shopify's platform constraints aren't bugs—they're security features. Design around them, not against them.
IPOE: Incremental Profit Offer Engine
Traditional A/B testing tells you "Variant B converts 5% better." But it doesn't answer the question that actually matters: "Did those extra conversions come from customers who would have bought anyway?"
If you discount someone who would've purchased anyway, you didn't "recover revenue"—you paid margin for nothing.
IPOE uses causal inference to measure incremental impact—the lift that wouldn't have happened without intervention.
- A small % of sessions are holdout (no discount offers)
- The rest are treatment (Reclavio engages + offers when rules allow)
- The difference estimates incremental lift
- Discounts are scored by Expected Incremental Profit (EIP), not raw conversion
The IPOE decision flow: from holdout assignment through guardrail enforcement, EIP scoring, and final action selection.
CUPED Variance Reduction
We use CUPED (Controlled-experiment Using Pre-Experiment Data) to reduce noise in lift estimates. By adjusting for pre-intervention cart behavior, we can detect smaller effects with fewer samples—meaning faster statistical significance with less traffic.
The Safety Fuse: Circuit Breaker (G8)
Here's a scenario that kept me up at night: what if a bug in the bandit causes it to recommend discounts on every single session? The holdout group gets nothing, the treatment group gets 100% discount rate, and by the time you notice, you've bled margin for hours.
The answer is G8: Anomaly Detection — an automatic circuit breaker that monitors the treatment-only discount rate across a sliding window and trips a two-stage fuse if something looks wrong.
The Two-Stage Fuse
The circuit breaker evaluates the treatment discount rate — the fraction of non-holdout decisions that result in a discount action — using a sliding window of 5-minute buckets over the last 60 minutes.
| Stage | Threshold | Behavior |
|---|---|---|
| Normal | Rate < 80% | System operates normally |
| Yellow | Rate ≥ 80% | Warning logged, metrics emitted — no action taken (detection mode) |
| Red | Rate ≥ 95% | Automatic pause: IPOE disabled for this merchant, all decisions return action: 'none' |
The Yellow threshold exists for observability — it creates an audit trail and emits metrics before a hard pause. When Red trips, the system:
- Sets a Redis flag (
ipoe:broken:{merchantId}) so every pod sees the pause instantly - Writes
pauseReason: 'CIRCUIT_BREAKER'to the merchant's IPOE state (DB persistence) - Logs a fleet-level deduplication record (one alert per merchant, not per pod)
Why a Sliding Window, Not a Global Counter
A naive "count total discounts" approach suffers from dilution — early healthy decisions mask a sudden spike. The sliding window (twelve 5-minute buckets, 60-minute trailing window) captures recent behavior without being noisy enough to trip on normal variance.
stateDiagram-v2
[*] --> Normal
Normal --> Yellow: treatmentDiscountRate ≥ 80%
Yellow --> Red: treatmentDiscountRate ≥ 95%\n+ debounce (2 consecutive evaluations)
Red --> Normal: Merchant resumes IPOE\n(reset clears all Redis state)
Yellow --> Normal: Rate drops below 80%
state Red {
[*] --> Paused
Paused: IPOE disabled
Paused: All decisions → 'none'
Paused: Dashboard shows safety banner
}
Pause Priority: Not All Pauses Are Equal
A merchant can be paused for multiple reasons — billing overdue, manual admin pause, circuit breaker trip, or the merchant choosing to pause themselves. The circuit breaker respects a strict priority ordering:
BILLING_OVERDUE > ADMIN_PAUSE > CIRCUIT_BREAKER > USER_PAUSE
If a merchant is already paused for billing, the circuit breaker won't overwrite the pause reason. When the circuit breaker trips, it first checks: "Is there already a higher-priority pause in place?" If yes, it records the trip for observability but doesn't touch the database state.
This prevents a subtle bug: if the CB overwrites BILLING_OVERDUE with CIRCUIT_BREAKER, and the merchant resumes from the CB pause, they'd bypass the billing block entirely.
The Hot Path: Zero-RTT When Healthy
On every decision, the orchestrator calls isCircuitBroken(merchantId) before any holdout or bandit logic. This must be fast. The implementation uses a bounded in-memory cache (10,000 entries max, 30-second TTL) backed by a single Redis GET:
- Cache hit (healthy): 0 Redis round-trips — pure in-memory
- Cache miss: 1 Redis
GET→ cache result for 30 seconds - Cache hit (broken): Return
trueimmediately — skip all IPOE logic
The 30-second TTL means a pause propagates to all pods within 30 seconds. For a safety mechanism, this is the right trade-off: fast enough to stop bleeding, without adding latency to every healthy decision.
Clean Reset Semantics
When a merchant resumes IPOE from a circuit breaker pause, the system performs a resetBreaker that deterministically clears all sliding window buckets and the broken flag — without using Redis KEYS() or SCAN(). This prevents the "Resume Loop" where stale bucket data causes an immediate re-trip on resume.
💡 Design principle: Safety mechanisms must be self-healing. A circuit breaker that requires manual Redis cleanup to resume isn't a safety feature — it's a footgun.
Third-Party Integrations
Reclavio syncs events to Klaviyo, Omnisend, or custom webhook endpoints. It doesn't replace your email/SMS stack—it makes it smarter by syncing in-session context.
One-click OAuth for Klaviyo and Omnisend; custom webhooks for everything else.
Supported Integrations
| Platform | Auth Method | Event Types |
|---|---|---|
| Klaviyo | OAuth 2.0 | Abandoned cart, offer displayed, offer accepted, conversion |
| Omnisend | API Key | Abandoned cart, offer displayed, offer accepted, conversion |
| Custom Webhook | HMAC Signature | All events (configurable) |
Event Schema
{
"eventType": "reclavio.offer.accepted",
"timestamp": "2026-02-01T08:15:00Z",
"merchantId": "shop_abc123",
"sessionId": "session_xyz789",
"cartValueCents": 15000,
"discountOfferedPercent": 10,
"discountUsed": true,
"channel": "widget"
}
Security Architecture
- HMAC Signatures: Every webhook signed with your secret key
- SSRF Protection: Multi-layer defense (URL validation, protocol/port enforcement, DNS resolution, IP denylist, redirect blocking) prevents internal network attacks
- Encrypted Storage: OAuth tokens encrypted with AES-256-GCM
- Auto-Disable: Failing endpoints disabled after 5 consecutive failures
- Dead Letter Queue: Failed events queued for retry with exponential backoff
💡 All webhook delivery is idempotent. You can safely retry or replay events without duplicate processing.
LLM Safety (OWASP LLM-aligned)
Controls align with the OWASP Top 10 for LLM Applications. (OWASP LLM Top 10)
This prevents the AI from being tricked into issuing unauthorized discounts or leaking merchant data.
| Control | Implementation |
|---|---|
| Input Sanitization | Strip potential injection characters before LLM call |
| Output Cleansing | HTML stripping, URL allowlist verification |
| State Protection | Non-negotiation routes cryptographically blocked from discounts |
| Rate Limiting | Tiered per-session limits prevent abuse |
Implementation Checklists
For Storefront Widgets
- Handle proxy timeout constraints with deadline budgeting
- Use session headers for continuity (cookies will be stripped); auth via proxy signature
- Use body-level correlation IDs as fallback for stripped headers
- Add a circuit breaker (disable widget after N consecutive errors)
- Prevent duplicate request submission (send locks during latency)
- Provide visible fallback UI for error states
For Offer Issuance Safety
- Make eligibility deterministic and compute it before LLM calls
- Implement idempotent offer generation (one offer per cart/session/rule)
- Enforce discount caps in the decision layer, not just the LLM prompt
- Add a circuit breaker for anomalous discount rates (auto-pause before margin bleeds)
- Log every offer with correlation IDs for audit trails
- Build anti-farming controls (rate limits, deduplication, TTL enforcement)
For LLM Integration in User-Facing Flows
- Sanitize all LLM inputs (strip injection patterns)
- Validate all LLM outputs (JSON schema enforcement)
- Implement hard timeouts with fallback responses
- Never let the LLM decide eligibility—only presentation
- Log LLM calls with latency and token usage for cost management
What's Implemented (February 2026)
Reclavio is pre-launch / pre–Shopify App Store. These are the components implemented and in active testing:
- ✅ Shadow Mode: Evaluate rules on real traffic without affecting shoppers
- ✅ Opportunities (Shadow) Card: Dashboard visibility showing at-risk cart value
- ✅ Offer Ledger: Idempotent issuance (stable offer per cart/session/rule)
- ✅ Dual-lane delivery: Proxy-safe path + streaming path with fallback hierarchy
- ✅ Analytics foundations: Event model + SLO dashboards
- ✅ IPOE + Holdout Comparison Card: Configurable holdout (0–50%), 11-state comparison card, Wilson CI, Bayesian posterior, 7-day minimum runtime gate, SRM detection, export
- ✅ G8 Circuit Breaker: Two-stage anomaly detection fuse (Yellow warning at 80%, Red auto-pause at 95%) with sliding window, pause priority, bounded in-memory cache, and clean reset semantics
- ✅ Integrations module: Event schema + delivery pipeline (Klaviyo/Omnisend/custom webhook)
- ✅ Profit Protection Dashboard: "Why No Offer?" transparency logs with decision pipeline visualization
- ✅ SKU & Collection Exclusions: Fail-closed safeguards for protected products, collections, tags, vendors, product types, and gift cards
Next (planned before App Store submission):
- Smart Upsell Before Discount — "Add $15 more for free shipping" (increases AOV with zero margin cost)
- Time-Decay Offers — "This offer expires in 10 minutes" countdown urgency
- CRO Quick Wins — "Get It By" date calculator, thumb-zone mobile bottom sheet, velocity social proof
- Locale-Aware Routing — detect non-English locales and bypass keyword heuristics
Technical FAQ
How is holdout assignment deterministic?
Deterministic hash of sessionId + date. The same session always lands in the same group within a day. No random coin flips.
What happens when I change the holdout percentage? Changing the holdout percentage starts a new measurement epoch. A confirmation dialog warns that existing comparison data resets. Historical data isn't deleted—it's a fresh measurement window. Mixing traffic from different holdout ratios would invalidate the causal inference.
Can I export holdout results? Yes. One-click CSV with holdout %, epoch start date, session counts, conversion rates, lift estimates, confidence intervals, and guardrail metrics.
What's CUPED and why does it matter? CUPED (Controlled-experiment Using Pre-Experiment Data) reduces variance in lift estimates by adjusting for pre-intervention cart behavior. You get statistically significant results faster, with less traffic.
How does the 7-day minimum runtime gate work? Even with large sample sizes, the verdict stays at "Underpowered" until 7 days pass. Early data often reflects selection bias (weekday/weekend, promotions, seasonal effects). The gate prevents false positives.
What's the SRM detection? Sample Ratio Mismatch detection flags when the actual treatment/holdout split deviates from the configured ratio—indicating a bug in assignment rather than a real treatment effect.
📖 Read the product story: The $14 Problem: Why Cart Recovery Tools Destroy Margin
Want early access? Join the waitlist →
References
How I verify claims: Shopify platform constraints are cited to official Shopify documentation (shopify.dev). Industry statistics are cited to Baymard Institute and Shopify publications. Product metrics are labeled as "pre-launch test results" or "targets."