AI Agents for Business Operations

Q: Who runs the agents after deployment?

We do. That's what makes us forward deployed engineers, not consultants. Our team monitors agent performance, tunes logic, handles edge cases, and continuously improves the systems. You get a dedicated engineer from our forward deployed engineering team who knows your business and your agents.

Why Most AI Agent Projects Fail

Everyone is talking about AI agents. Most of what you see is demos — a chatbot that can book a meeting, a prototype that summarises emails. That is not what we do.

Production AI agents are fundamentally different. They run on schedules or triggers, process real business data, make decisions within guardrails, and directly affect your P&L. They need monitoring, error handling, and escalation paths — just like any other critical system. And they need someone who stays to run them after launch day.

That is our job. We are forward deployed engineers. We build these agents into your business, and then we stay and run them. Below are the seven agents we have built and operate across our engagements.

What Makes These Production-Grade

01

Live Data Access

Every agent connects directly to production systems — databases, APIs, SaaS platforms. Not a copy or export. Real-time data, real permissions, real consequences.

02

Decision Logic

Rules, thresholds, and ML models that determine what action to take. A pricing agent does not just report that a product is not selling — it calculates the optimal price reduction based on elasticity modelling and applies it.

03

Guardrails

Every automated decision has boundaries. Margin floors, rate limits, human review triggers, anomaly detection. The agent acts autonomously within bounds, and escalates when something does not fit.

04

Us, Watching

Agents log every decision and track outcomes. Our team reviews performance daily, tunes logic weekly, and handles the edge cases that no automation can predict. That is the forward deployed difference.

Deployed & Running

Nine Agents in Production

All built on Claude / Anthropic All actively operated by our team

Agent 01 · Live

Pricing Agent

~15K warehouse SKUs / wk ~1.6K consignment / day +77% revenue

What it replaced: Manual flash-sale pricing across ~15,000 warehouse products and ~1,600 live consignment lines, refreshed by hand as sales and stock moved. Slow, inconsistent, and miles behind what the data actually said.

What it does now: Two engines plus a monitor plus a Slack assistant. The warehouse engine runs weekly (Monday) with elasticity, sell-through targets and lifecycle margin floors; detects active and upcoming discount codes and dampens cuts so promos and markdowns don't stack. The consignment engine runs daily and prices two-way — cuts for zero/low sellers (demand signal × urgency), increases for fast movers; urgency eases when a voucher is running. Both support preview (dry-run) and a stratified A/B split (treatment vs control) so impact is measured, not assumed. The 7-check monitor runs daily and Slacks the team when a recommendation goes stale or a run produces nothing it should. The Slack assistant (PydanticAI + Claude, small model triages / large model reasons) answers pricing questions in plain language — read-only via its own restricted role, with guardrails that strip credentials and SQL before replying. Renders to Word/Excel.

How it runs: Kubernetes (Helm). Engines, exports, daily reviews and the monitor run as scheduled jobs with health pings and structured logging; the assistant runs always-on. Recommendations and run metadata land in the warehouse; the buying team also gets Excel exports (warehouse: 9 sheets; consignment: 5). It recommends — it does not silently reprice the live site.

Measured impact: +77% revenue, +80% units sold, +72% gross profit at deployment. Full pricing engine breakdown →

Agent 05 · Live

CX Agent

600–1,450 tickets/day 13-strategy monitor Zendesk + Snowflake + Trustpilot

What it replaced: A half-day manual exercise to consolidate Zendesk (tickets, CSAT, agents, queue, response time), Snowflake (orders, returns, late dispatch, CDP value tiers) and Trustpilot (reputation, themes) into a CX read. Trend reads were slow and structural problems — CSAT trajectory, queue accumulation, agent regression, refund-cycle latency, VIP exposure — only surfaced when someone had time to look.

Full periodic analysis: A comprehensive scorecard built from WoW deltas, never a snapshot. CSAT, volume, queue, Ada handoff %, WISMO %, agent rankings and movements, channel and fulfilment-source mix — all this-period vs. prior with direction. Categories are recomputed from the live tag taxonomy each run because the taxonomy drifts. Lands as a Markdown report in cx/data/.

13-strategy auto-detection monitor: Day-of-week and hour-of-day aware baselines, Slack alerts with severity cooldowns. Every cycle: category spike, total volume spike, Ada handoff rate, emerging issues (keyword clusters in untagged tickets), fulfilment-source spike, step change (sustained elevation over consecutive windows). ~Hourly: CSAT threshold, CSAT WoW drop, response-time SLA. ~2-hourly: agent CSAT floor (anyone below 55% good/(good+bad), n≥20), agent workload (anyone > 2× team average daily). ~Daily: queue depth, late-delivery rate.

Incremental data pipeline: Cursor-based and append-only — second run is fast, tenth is fast. Zendesk tickets cursor on tickets_end_time, CSAT on highest rating_id seen, bad-CSAT enrichment skips already-done IDs. Trustpilot uses a Playwright (Chromium) scraper that solves the AWS WAF JS challenge transparently — raw curl / urllib get 403'd. Cold-start ~3.5 min (30-day backfill); subsequent runs typically < 1 min.

Degraded mode: If a source API fails, the run continues with the others and the report carries an explicit banner about which surface is stale. Never silently substitutes. Recommends only — does not change tickets, route work, or touch any system. Full CX breakdown →

Agent 02 · Live

Marketplace Agent (Mirakl offboarding)

Fair four-step escalation Weekly batch Parent-level only

What it replaced: Mirakl has no native "remove it if it never sells" rule, so dead listings accumulated on the marketplace — products sitting in stock for weeks with no sales, cluttering the catalogue and the customer's results. Nobody had time to clear them fairly.

What it does now: Identifies zero-selling products at parent level (only when every size and colour has had no sales), then escalates fairly. Step 1 (2 weeks, 0 sales): validate the listing is genuinely live and shoppable, not a visibility issue. Step 2 (4 weeks): email the seller with actionable feedback — visibility vs. listing/pricing. Step 3 (8 weeks, no improvement): remove the listing via the Mirakl API and record the selling price for any future relist. Step 4: manual re-activation for seasonal returns or brand resurgence. Counts gross sales (a refund still proves the product was wanted), the in-stock clock pauses when an item is OOS, and there are manual overrides at seller, brand, category and product level.

How it runs: Weekly batch on Monday, so a trigger that fires mid-week is consolidated into one send. Excludes BrandAlley's own and other internal/feed shops, plus suspended shops. Every action is logged for audit.

Measured impact: A consistent, fair, weekly clear of dead marketplace listings that used to drag on the catalogue for months. Sellers get warning + feedback before any removal — the process the team always wanted but couldn't run by hand.

Agent 03 · Live

Sale Launch Checker

20-point QA Daily 3:30am Catches sale failures before launch

What it replaced: Manual pre-launch checks on every flash sale — a team running through a checklist at the worst hour of the day, missing the one item that breaks a launch.

What it does now: Runs a 20-point automated QA every day at 03:30am ahead of the launch window. Validates pricing rules are wired, discount codes don't conflict, hero imagery is in place, landing pages resolve, inventory is allocated, integrations are responding, and the launch state is internally consistent. Slacks the team an at-a-glance pass/fail with the failing checks named.

Measured impact: Launches that previously needed a human awake at 4am are now caught and surfaced automatically. Failure modes that used to be discovered by customers are discovered before the sale goes live.

Agent 04 · Live

Paid Ads Agent

5 channels £1.14M / 90d spend ~£248K/yr commission leakage recovered

What it replaced: A half-day Monday review across Google Ads, Meta, CJ Affiliates, GA4 and Attentive — five APIs, five sets of quirks, with deep-dives that slipped and edge cases (commission leakage on marketplace orders) that sat unnoticed for weeks.

What it does now — weekly review: One ex-brand-first scorecard across all five channels with WoW comparison, flags and recommendations. Ex-brand leads because brand search at 70x+ ROAS is captured demand and would flatter the acquisition number; flag thresholds (ROAS < 12x warning, < 8x critical, revenue WoW < -10% red) all fire on ex-brand.

Five auto-triggered deep-dives: Google Ads movers (spend +10% with revenue down, or ROAS drop > 20%) · PMax search terms (any PMax < 10x ROAS with > £500 spend) · Budget & impression share (ROAS drop > 15% or < 12x) · CJ poaching (rate > 15%, orders converting in < 2 minutes from click) · Meta creative audit (any campaign < 8x ROAS with > £1K spend). Fires only when conditions hit — no noise on a quiet week.

CJ × Mirakl reconciliation: Joins CJ commission records to Magento orders by orderId and splits each order's items into BA-fulfilled vs. Mirakl marketplace. Surfaced ~£61K of estimated overpayment in 90 days (~£248K annualised), concentrated in CSS and cashback publishers — with sample orders for claim-back evidence and the data the affiliate team needs to renegotiate or fix the conversion pixel.

How it runs: Local cron at 07:00 UTC every Monday (with a backup remote trigger). Posts to #paid-ads-agent with severity-coded flags and action-coded recommendations. Degraded mode: if a channel API fails, it's excluded from the blended numbers with an explicit banner — never silently substituting a smaller picture as the full one.

Agent 06 · Live

Product Change Monitor

Catalogue drift detector Slack-first

What it replaced: No way to spot a silent change in the live catalogue — a SKU's price jumping, a stock value flipping, a title or image swapped without a ticket — until a customer or a report surfaced it.

What it does now: Tracks the live catalogue and alerts on material changes — unexpected price moves, out-of-band stock swings, attribute edits on high-velocity SKUs. Posts the diff to Slack so the merchandising and pricing teams see catalogue drift the moment it happens, not in next week's report.

Agent 07 · Live

CDP / Data AI

16.4M customers ~£8.3M/yr margin protection Daily refresh on ClickHouse

What it replaced: Email and SMS to a very large list with no view of who each customer is — same frequency, same discounts, same generic catalogue for everyone. That over-sent to people who'd stopped engaging (hurting deliverability) and handed discounts to people who'd have paid full price (giving away margin).

What it builds — every customer, every day: A unified per-customer profile with RFM (recency / frequency / monetary 1–5), engagement score (0–100), churn risk (probability + Low/Medium/High/Critical band), value tier (VIP / High / Standard / Low from predicted 12-month value), discount strategy (every customer tagged SUPPRESS, LIGHT, STANDARD or DEEP), brand and category affinities, and membership across 40+ fixed segments. Flattens to one wide export table (100+ attributes per customer) that activation tools read directly.

How it runs: Built on Snowflake (dev), amended and put live by the data team on ClickHouse, where it runs today. Packaged as a container, run as a Kubernetes CronJob in the dbt-ch namespace daily at 02:00 UTC before the sync window. Seven idempotent jobs in order; the whole refresh completes in roughly an hour. Reverse-ETL sync into Attentive for email and SMS targeting.

What it is worth (the CDP's own analysis): ~£8.3M/yr margin protection from holding discounts back from the ~90K full-price buyers (the SUPPRESS group). ~£9.8M of at-risk revenue across ~302K customers recoverable through win-back. 77% of revenue concentrated in the top ~3% (VIP + High value) — the case for VIP treatment.

No AI model in the pipeline. It's rule-based, statistical data engineering — not an LLM. It does not send anything to customers; it prepares segments and attributes and marketing decides. Full CDP breakdown →

Agent 08 · Live

FreshService AI Agent

Claude-based IT triage 5-min cycle, business hours BookStack + Qdrant memory

What it replaced: IT engineers triaging every inbound ticket by hand — reading, assessing, asking for missing detail, looking up the runbook, writing a guidance note, setting the priority.

What it does now: Picks up open tickets (capped per run, default 20) and asks Claude whether there's enough to go on. If yes: posts a private troubleshooting note with step-by-step guidance, sets priority via the ITIL impact/urgency matrix, tags as AI-processed, and stores the case in memory. If no: posts a polite public reply asking for the specific missing detail. After two unanswered follow-ups it tags for human review and moves to the escalation group.

What grounds the answers: The BookStack knowledge base (a designated chapter loaded each run, falls back to general IT knowledge if BookStack is down) plus a Qdrant vector memory of resolved tickets so it retrieves genuinely similar, already-solved cases and cites them.

How it runs: Native service on the Hetzner automation-01 host (systemd long-running API + 5-minute timer + Cloudflare Tunnel for endpoints), secrets injected from 1Password. Runs every 5 minutes Monday–Friday, 07:00–18:00 UTC; exits quietly outside that window. Dry-run mode for safe validation; British English throughout; escalates to a human when it can't help.

Agent 09 · Live

Compliance Worker (SOC2 enforcement)

Drata + Microsoft Conditional Access 14→7→3→1→blocked Auto-restore on compliance

What it replaced: Security manually chasing staff to accept policies, complete training and install the device agent — and then deciding when to revoke access. SOC2 evidence by goodwill and follow-up.

What it does now: Each working day, reads Drata for three checks (policies accepted, security training done, device agent installed) and walks anyone who fails through a fixed escalation ladder: Day 0 14-day notice → after 7 days 7-day urgent → after 11 days 3-day final warning → after 13 days 1-day last chance → after 14 days blocked from Microsoft cloud services by adding them to a specific Conditional Access policy. Manager copied throughout. Restore is automatic the moment they're compliant again.

Safety nets: Dry-run and test modes for safe changes. Exemption group plus a per-user "Excluded" tag for legitimate exceptions. FreshService ticket raised on errors. Device-agent failures raise a ticket instead of an email chase. Keeps state in a small file so the ladder survives restarts.

No AI model. Straightforward rule-based automation in support of SOC2.

We Build Them, Then We Run Them

Most AI projects fail after handover. The consultancy ships a demo, hands over documentation, and leaves. Six months later the model has drifted, nobody knows how to tune it, and the whole thing gets quietly switched off.

We do not do handovers. We are forward deployed engineers. We build the agents in your environment, then stay embedded to run them. We monitor performance daily, tune logic weekly, and handle the edge cases no automation can predict. When something breaks at 2am, we fix it — not your team.

Each agent recommends or acts within tight guardrails. The Pricing Agent does not silently reprice the live site. The CX Agent does not change tickets. The Marketplace Agent records the price before any removal, so a relist is one click away. We build for trust first. That is why our pricing is a base retainer plus a performance fee tied to results. We only win when the agents are working.

Agent Infrastructure We Have Built

Every agent here connects to real production systems — Snowflake, ClickHouse, Magento, Mirakl, Zendesk, Trustpilot, Google Ads, Meta, CJ, GA4, Attentive, Klaviyo, Drata, Microsoft Graph, FreshService — through Anthropic's Claude with MCP and our own tool layers. Across the wider stack we run 215+ MCP marketing tools and a 56-tool ops agent connecting catalogue, billing, customer and reporting systems.

This is what production agent infrastructure looks like: not one model doing one thing, but a coordination layer that gives the agent hands across every system in the business — with degraded-mode handling, dry-run modes, severity-coded alerts, and audit logs on every action.

Frequently Asked Questions

What is an AI agent vs a chatbot?

A chatbot responds to user messages in conversation. An AI agent operates autonomously — it runs on schedules, monitors data streams, makes decisions, and takes actions without human prompting. A chatbot might tell you a product's price. Our pricing agent analyses 37,800 products, calculates optimal prices using elasticity models, and applies the changes — every week, automatically.

How long does it take to deploy an AI agent?

A single agent typically takes 2-4 weeks from data access to production deployment. The first week is data integration and pipeline setup. Week 2 is logic development and testing. Weeks 3-4 are deployment, monitoring setup, and guardrail tuning. We stay and run them after that — that is the forward deployed model.

What happens when an agent makes a wrong decision?

Every agent has guardrails — margin floors, rate limits, anomaly thresholds. When a decision falls outside bounds, the agent escalates to human review rather than acting. All decisions are logged with full audit trails. Our team reviews agent performance daily, so suboptimal patterns get caught and corrected fast.

Do agents replace all human roles?

No. Agents replace specific manual tasks within roles. A pricing agent replaces the spreadsheet work of reviewing 15,000 products weekly — but the merchandising team still sets strategy, manages vendor relationships, and handles exceptions. The net effect is usually 2-4 FTE equivalent freed per agent, so humans can do higher-value work.

Who runs the agents after deployment?

We do. That is the whole point. We are not consultants who build something and leave. Our team monitors agent performance, tunes logic, handles edge cases, and continuously improves the systems. You get a dedicated engineer who knows your business and your agents. Our pricing model reflects this — we share in the upside because we stay to create it.

These are the agents we've built and run

Why Most AI Agent Projects Fail

What Makes These Production-Grade

Live Data Access

Decision Logic

Guardrails

Us, Watching

Nine Agents in Production

Pricing Agent

CX Agent

Marketplace Agent (Mirakl offboarding)

Sale Launch Checker

Paid Ads Agent

Product Change Monitor

CDP / Data AI

FreshService AI Agent

Compliance Worker (SOC2 enforcement)

We Build Them, Then We Run Them

Agent Infrastructure We Have Built

Frequently Asked Questions

We build agents. We run agents. That is the job.

These are the agents we've built and run

Related Resources

Data Streaming Platform

Health-Tech SaaS

AI Agent Deployment

Claude Code Ai Agents At Scale

Replacing Saas Vendors With Ai

Why Most AI Agent Projects Fail

What Makes These Production-Grade

Live Data Access

Decision Logic

Guardrails

Us, Watching

Nine Agents in Production

Pricing Agent

CX Agent

Marketplace Agent (Mirakl offboarding)

Sale Launch Checker

Paid Ads Agent

Product Change Monitor

CDP / Data AI

FreshService AI Agent

Compliance Worker (SOC2 enforcement)

We Build Them, Then We Run Them

Agent Infrastructure We Have Built

Frequently Asked Questions

We build agents. We run agents. That is the job.