What Happens When Things Go Wrong: Circuit Breakers and Safety Guarantees
Every discount tool works when everything goes right. The question that matters is: what happens when something goes wrong?
A bug in the AI model. A misconfigured rule. A sudden traffic spike that floods the system with discount requests. These aren't hypothetical scenarios—they're Tuesday. And the difference between a system that protects your margin and one that burns it is what happens in the 30 seconds before anyone notices.
Reclavio is built around a simple principle: safety is a feature, not an afterthought. Every layer of the system—from the AI to the discount pool to the delivery mechanism—is designed to fail safely.
TL;DR: Reclavio uses circuit breakers, guardrails, and fail-closed defaults to protect your margin automatically—even when the AI is wrong, the rules are misconfigured, or the infrastructure is degraded.
Author: Brodie, Founder @ Reclavio Status: Private beta (pre–Shopify App Store) Last updated: February 2026
Who This Is For
✅ Merchants worried about runaway discounts — what if the AI goes haywire at 2 AM? ✅ Teams evaluating discount automation — need to explain the risk model to leadership ✅ Technical decision-makers — want to understand the engineering behind "safe AI" ❌ Not for: Stores that are comfortable with "hope nothing breaks" as a strategy
The Problem Nobody Talks About
Most cart recovery and discount tools operate in one of two modes:
- Dumb rules — fixed discounts, no intelligence, no adaptation
- Black box AI — "trust us, the algorithm knows best"
Neither has an answer for: What if the algorithm is wrong?
With a dumb-rules tool, a misconfigured rule means every cart gets 20% off until someone notices. With a black-box AI tool, a model regression means the same thing—except you can't even audit why.
The failure mode is always the same: your margin evaporates silently.
The real cost of "just discounts"
When a discount tool fails, it doesn't throw errors. It succeeds—in the worst possible way:
- Every discount code is valid ✅
- Every conversion counts ✅
- Your dashboard shows green numbers ✅
- Your actual margin? Underwater ❌
The dashboards look great right up until your accountant doesn't.
Defense in Depth: Seven Guardrails + One Circuit Breaker
Reclavio doesn't rely on a single safety check. Every offer passes through seven independent guardrails before it reaches a shopper. If any guardrail says no, the offer is blocked—and the reason is logged.
The Seven Guardrails
| Guardrail | What It Protects Against |
|---|---|
| G1: Margin Floor | Offer would drop post-discount margin below your configured minimum |
| G2: Budget Pacing | Daily discount budget is exhausted or nearly exhausted |
| G3: One-Offer-Per-Cart | Shopper already has an active offer in this session |
| G4: Customer Cooldown | Customer recently converted with a Reclavio discount |
| G5: Session Frequency | Too many offer attempts in a single session |
| G6: Discount Code Limit | Shopify's per-order discount code limit would be exceeded |
| G7: Combination Policy | Merchant's discount combining preferences would be violated |
Each guardrail operates independently. G1 doesn't care what G4 thinks. If the margin floor is violated, it doesn't matter that the customer hasn't seen an offer in months—the offer is blocked.
Every block is logged and visible in your Activity Log as a "Smart Block" with the specific guardrail that fired.
The Circuit Breaker: Automated Emergency Stop
Guardrails protect individual transactions. But what if something goes wrong at scale? A model update that makes IPOE too aggressive. A configuration change that loosens rules across the board. A sudden surge of high-intent traffic that happens to qualify for offers.
That's what the Circuit Breaker is for.
How It Works
The Circuit Breaker monitors the treatment discount rate across a sliding window—the percentage of treatment sessions that receive a discount. If that rate exceeds safe thresholds, the system responds automatically:
| Stage | Condition | What Happens |
|---|---|---|
| Normal | Discount rate < 80% | Business as usual |
| Yellow | Discount rate ≥ 80% | Warning emitted. Metrics logged. No action yet |
| Red | Discount rate ≥ 90% | All offers paused immediately. Dashboard notified |
When the Circuit Breaker trips Red, three things happen within seconds:
- All offers stop. No new discount codes are issued. Existing valid codes continue to work (you can't claw back a code a shopper already has), but no new ones are minted.
- You see a banner. A "Safety Pause Active" banner appears at the top of your Reclavio dashboard with a "Review & Resume" button.
- The reason is logged. The exact metric that triggered the trip—discount rate, window size, timestamp—is recorded for your review.
Why "Discount Rate" Is the Right Signal
Some systems monitor error rates or latency. Those catch infrastructure problems. But the failure mode we're defending against isn't downtime—it's success. Too many offers, too fast, eroding your margin.
The treatment discount rate is the most direct proxy for margin risk. A rate above 80% means more than 4 out of 5 shoppers are getting discounts—that triggers a warning. Above 90%, the system hard-pauses. That's almost never intentional—and if it is, you can review and resume.
Holdout-Aware Measurement
The Circuit Breaker is smart about holdout groups. If 20% of your traffic is in a holdout, the discount rate is calculated against treatment sessions only—not total sessions. Without this, a large holdout would artificially dilute the discount rate and mask problems.
Treatment discount rate = discounts issued / (total sessions − holdout sessions)
Clean Reset: No Re-trip Loops
When you review the situation and click "Resume," the Circuit Breaker resets cleanly. The sliding window is cleared, so stale data from the anomaly doesn't cause an immediate re-trip. You resume with a fresh baseline.
This sounds obvious, but it's a common failure in circuit breaker implementations. A naive reset that doesn't clear the window will re-trip the instant it evaluates the old data—sending merchants into a frustrating pause/resume loop.
Fail-Closed Defaults: When in Doubt, Don't Discount
Beyond guardrails and the circuit breaker, Reclavio's architecture follows a fail-closed philosophy at every layer:
What "Fail-Closed" Means in Practice
| Scenario | What Happens | Why |
|---|---|---|
| AI model timeout | Template response, no offer | Shoppers never see errors; margin is protected |
| Rules fetch fails | Empty ruleset → no offer eligible | Missing rules = no eligibility, not "everything eligible" |
| Exclusion API timeout | Product treated as excluded | Unknown products are excluded, not included |
| Cart verification timeout | Client value trusted (fail-open)* | Availability > strict verification for cart data |
| Redis unavailable | Discount pool returns "unavailable" | No pool access = no codes minted |
| IPOE scoring fails | Offer blocked (negative EIP default) | Broken scoring = not profitable enough to risk |
Cart verification is the one intentional fail-open point. If we can't verify the cart, we trust the client's reported value and proceed. This prioritizes availability—a shopper shouldn't be blocked from getting help because of a transient API error. The financial risk is mitigated by the other six guardrails that still evaluate independently.
The Design Principle
Every service in Reclavio answers one question: "If this component fails, do we issue more discounts or fewer?"
The answer must always be fewer. If a failure causes more discounts, the architecture is wrong.
Safety-Degraded Mode: When the Safety System Itself Fails
What happens when the circuit breaker's data store (Redis) goes down? A naive implementation would either crash or silently stop protecting you.
Reclavio detects this. After a threshold of consecutive write failures (20), the system enters safety-degraded mode:
- Offers continue (we don't halt your business for infrastructure issues)
- But the Circuit Breaker's monitoring is flagged as potentially unreliable
- Metrics are emitted for alerting
- When Redis recovers, monitoring resumes automatically
This is the safety system's safety system. It ensures you know when protection might be degraded—instead of assuming everything is fine.
Pause Priority: Who Can Override Whom?
Multiple systems can pause Reclavio's offers. When two pause reasons conflict, a strict priority order determines which one takes precedence:
| Priority | Pause Source | Can Be Overridden By |
|---|---|---|
| 1 (Highest) | Billing issue | Nothing |
| 2 | Admin action (Shopify admin) | Billing only |
| 3 | Circuit Breaker (automated) | Admin or Billing |
| 4 (Lowest) | User pause (dashboard toggle) | Any of the above |
This means the Circuit Breaker can't override a billing pause (you can't "resume" your way past an unpaid invoice). And a user toggling "pause" in the dashboard doesn't interfere with the Circuit Breaker's reset semantics.
📦 Free Download: Rule Templates Pack Get 6 ready-to-import recovery rules—Free Shipping Threshold, First-Time Buyer, Exit Intent, High-AOV Guardrail, Excluded Category, and Shadow Mode Starter—plus a testing checklist. Get the templates (free) →
What You Control
Safety in Reclavio isn't black-box. Here's what you can configure:
| Setting | Where | What It Does |
|---|---|---|
| Margin floor | Settings → Rules | Minimum acceptable margin after discount |
| Daily budget | Settings → IPOE | Maximum discount spend per day |
| Holdout percentage | Settings → IPOE | Fraction of traffic excluded from offers |
| IPOE mode | Settings → IPOE | OFF / SHADOW / LIVE (progressive rollout) |
| Exclusion lists | Settings → Exclusions | Products, collections, tags, vendors, types |
| Rule activation | Settings → Rules | Active/inactive toggle per rule |
| Activity review | Activity → Smart Blocks | Full audit trail of every blocked decision |
You don't need to trust the system. You can verify every decision it makes.
FAQ
Q: Can the AI override the circuit breaker? A: No. The circuit breaker operates independently of the AI, the rule engine, and IPOE. When it trips, all offer pathways are blocked. The AI can still answer questions (shipping, returns, product info)—but no discount codes are issued.
Q: What if I intentionally want to discount most sessions (e.g., a sale)? A: The circuit breaker monitors the automated discount rate from Reclavio, not your store's overall discount activity. If you're running a store-wide sale through Shopify's native tools, that doesn't affect the circuit breaker. If you want Reclavio itself to be more aggressive, adjust your rules and budget—the circuit breaker thresholds are calibrated for anomaly detection, not normal operation.
Q: Has the circuit breaker ever tripped in production? A: We're in private beta (pre-app-store), so we've tested this extensively in staging and load testing. The two-stage fuse (Yellow → Red with debounce) was specifically designed to avoid false positives from natural traffic variance while catching genuine anomalies quickly.
Q: How fast does the circuit breaker react? A: Seconds. The sliding window is evaluated on each decision. When the threshold is crossed, the pause takes effect on the next decision cycle. There's a brief debounce to prevent single-request noise from tripping the breaker.
Q: Can I see historical circuit breaker events? A: Yes. Every Yellow and Red event is logged with the triggering metrics. You can review these in your Activity Log and correlate them with rule changes or traffic patterns.
The Trust Contract: What the Safety System Promises
| Component | Promise |
|---|---|
| Guardrails | Every offer passes 7 independent checks. Any single check can block |
| Circuit Breaker | Anomalous discount rates trigger automatic pause within seconds |
| Fail-Closed | Infrastructure failures result in fewer discounts, not more |
| Pause Priority | Billing > Admin > Circuit Breaker > User. No override confusion |
| Safety-Degraded | You know when protection might be degraded—never silent failure |
| Auditability | Every block, trip, and pause is logged and reviewable |
Every discount tool promises to increase your revenue. Only one promises to protect your margin when things go wrong. That's the difference between conversion-first and profit-first.
Get the templates (free) → · Want early access instead? Join the waitlist →
For how the full decision pipeline works, read The Four Engines: What Actually Decides Whether You Get a Discount. To try the system risk-free, read Shadow Mode: How to Prove ROI Before You're Ready to Commit. Have questions? Check the Help Center →.
References
How I verify claims: Circuit breaker thresholds, guardrail behaviors, and fail-closed defaults are verified against the codebase as of February 2026. The G8 circuit breaker implementation includes 18 unit tests and 7 integration tests. Pause priority semantics are enforced by the SafetyService with bidirectional priority resolution.