The decision was not ideological. It was operational. Here is why every one of our 7 production AI agents runs on Claude — and why everyone on the team is trained on Anthropic's stack.
I have been building production systems for over 25 years. CTO at IG Group. Founded DatingUK and PetMeds, both sold. Built AVORA Analytics to €7.4M raised. Co-founded Gravity Data, acquired. Co-founded Streamkap, $3.3M raised. I say this not to list credentials but to make a point: I have no loyalty to any vendor. I use whatever works.
When I started building MarginOps in late 2024, I tested every serious foundation model. GPT-4o, Gemini Pro, Llama 3, Mistral Large, and Claude. The use case was specific: I needed a model that could analyse a £6.4M transformation programme across 119 EBITDA-tracked initiatives, reason about pricing elasticity across 37,800 products, and generate actionable recommendations that a CFO would trust enough to sign off on.
GPT-4o was fast and capable, but its reasoning on multi-step financial analysis was inconsistent. It would get the direction right but miss second-order effects — the kind of thing that turns a margin improvement into a margin erosion when you miss the interaction between a markdown and an active promotion. Gemini was impressive on benchmarks but hallucinated confidently on edge cases in financial data. Open-source models were out of the question for enterprise clients handling sensitive P&L data — the compliance overhead alone would have killed the business case.
Claude was different. Not perfect. But consistently, measurably better at the specific kind of reasoning that margin analysis demands.
The pricing engine is where Claude's reasoning advantage showed up most clearly. The engine monitors 37,800 products every 15 minutes, running 7 automated checks against the live catalogue. Each check requires multi-step reasoning: is this product subject to an active promotion? If so, what is the effective price after the discount code? If the engine recommends a further markdown, does the combined discount breach the margin floor for this category?
This is not pattern matching. This is chain-of-thought reasoning over structured financial data with real constraints. Claude handles it reliably. The discount conflict detection alone — powered by Claude's ability to reason about overlapping promotional mechanics — prevented an estimated £180K in annual margin leakage.
The customer data platform is another example. Claude performs customer segmentation across behavioural, transactional, and engagement dimensions. It identifies segments like "high-value lapsing customers who respond to category-specific promotions but not site-wide discounts." That level of nuance requires genuine reasoning about customer behaviour, not just clustering algorithms.
MarginOps works with PE-backed retailers. Our clients have boards, audit committees, and compliance teams. When I tell a CFO that an AI agent is making pricing decisions across their entire product catalogue, the first question is never "how accurate is it?" The first question is always "how do I know it won't do something catastrophic?"
Anthropic's approach to AI safety is not a marketing differentiator for me. It is a commercial requirement. Claude's Constitutional AI framework, its refusal to fabricate data when uncertain, and its transparent reasoning process mean I can give clients audit trails they can actually review. When our pricing agent recommends a markdown, it explains why. When it flags a conflict, it shows the reasoning chain. This is not a nice-to-have. For board-level clients managing £6.4M transformation programmes with 119 EBITDA-tracked initiatives, it is table stakes.
The responsible AI angle also matters for the customer experience agent. When an AI is handling first-line support for thousands of customers, you need a model that knows what it does not know and escalates gracefully. Claude's CSAT contribution was part of taking satisfaction from 59% to 80%.
Curious what your margin opportunity looks like?
Free Tool
How much margin are you leaving on the table?
Answer 6 questions. Get a personalised margin estimate in under 2 minutes.
Take the Free Margin AuditThe gap between "impressive demo" and "production system" is where most AI projects die. I have seen it repeatedly — at AVORA, at Gravity Data, across dozens of client engagements. The model works in a notebook. It falls apart when you need it to run autonomously, handle edge cases, recover from failures, and coordinate with other systems.
Claude's agentic capabilities — particularly since Opus 4 — closed that gap. All 7 of our production agents run on Claude: the pricing agent, the CDP agent, the CX agent, the warehouse agent, the marketplace agent, the DevOps agent, and the analytics agent. Each one runs in production, autonomously, processing real transactions and making real decisions.
Claude Code transformed the development workflow. I build and deploy these agents directly from the terminal. The iteration speed is absurd compared to what I experienced building AVORA or Streamkap. A pricing model that would have taken a month of iteration now converges in a week. An agent that would have needed a team of three engineers to build and maintain runs on Claude Code with a single operator.
The results speak for themselves. +77% revenue from AI pricing. 60% cloud cost reduction. CSAT from 59% to 80%. All powered by Claude. All in production. All generating measurable EBITDA impact.
MarginOps is proof of what Claude can do when it is deployed by forward deployed engineers, not theorists. We do not build demos. We build production AI agents that move P&L lines. Every one of our 119 initiatives is tracked to EBITDA. Every agent has measurable outcomes. Every recommendation has an audit trail. Everyone on the team is trained on Anthropic's tools — Agent SDK, MCP, Claude Code — and uses them daily.
If you are evaluating AI foundations for enterprise operations — pricing, supply chain, customer experience, cloud infrastructure — I would encourage you to test Claude against your actual workloads, not benchmarks. Benchmarks measure capability. Production measures reliability. Claude delivers both.
That is why MarginOps runs on Claude. Not because it is the most hyped. Because it is the most reliable when the stakes are real.
We will analyse your operations and show you where AI agents can drive measurable EBITDA improvement. No slide decks. Just numbers.
We go into businesses and make them permanently more profitable. Every initiative is EBITDA-tracked.