Cloud Migration Without Downtime: How We Cut Hosting Costs by 60%

The AWS Default

AWS is the default choice for cloud infrastructure. It is also, for a large category of workloads, dramatically overpriced. This is not a controversial opinion among people who actually manage infrastructure costs. It is only controversial among people who have never compared an AWS bill to the cost of running the same workload on alternative providers.

The client — a major UK fashion e-commerce retailer processing approximately 2 million requests per day — was running their entire infrastructure on AWS. Three Aurora MySQL databases (465GB total), commerce application servers, Varnish caching layers, worker queues, cron jobs, staging environments, and a full CI/CD pipeline. The monthly AWS bill was £38,000. And for most of these workloads, AWS was providing commodity compute at premium prices.

The question was not whether the bill was too high. It was obviously too high. The question was how to reduce it without any service disruption to a platform processing 2 million requests per day, operating a large curated marketplace, and running real-time pricing engines that update every 15 minutes.

The Audit: What Actually Needs AWS

The first step was a workload-by-workload audit. Not every service running on AWS needed to be on AWS. The goal was to identify which workloads genuinely benefited from AWS-specific features and which were simply running on AWS because that was where they were first deployed.

We categorised every workload into three tiers.

Tier 1: AWS-native. Services that use AWS-specific features with no direct equivalent elsewhere. For this client, that was limited to: Aurora MySQL (with its automated failover and replication features), certain Lambda functions that triggered on AWS-specific events, and the integration layer with third-party services that expected AWS endpoints. These stayed on AWS.

Tier 2: Cloud-agnostic. Services that run on any Linux server with Docker. This was the majority of the stack: the commerce application servers, Varnish caching, worker queues, cron jobs, staging environments, and internal tools. These workloads were using AWS EC2 instances as expensive Linux servers. They did not use auto-scaling groups (the traffic pattern was predictable), did not use AWS-specific APIs, and did not benefit from AWS's managed services. They were candidates for migration.

Tier 3: Over-provisioned. Services that were running on instance types far larger than their actual resource consumption required. We found staging environments running on production-sized instances, cron jobs running on dedicated servers that were idle 95% of the time, and monitoring infrastructure that had been provisioned for a peak load that never materialized. These needed right-sizing regardless of which cloud they ran on.

The Destination: Multi-Cloud Architecture

The migration target was not a single alternative provider. It was a multi-cloud architecture that placed each workload on the most cost-effective platform for its requirements.

Production databases remained on AWS Aurora. The automated failover, point-in-time recovery, and read replica scaling are genuinely valuable for a database serving 2 million daily requests. The cost premium over self-managed MySQL is justified by the operational risk reduction. We did, however, right-size the instances and optimize the reserved instance coverage, saving approximately 30% on the database tier alone.

Application servers and caching moved to Hetzner dedicated servers. For predictable workloads that do not need auto-scaling, a Hetzner dedicated server at £80/month delivers the same spec as an EC2 instance costing £500+/month. That is not a marginal saving — it is an 80% reduction per server. The commerce application runs in Docker containers, making the migration a configuration change rather than a code change. Varnish caching, which was already running as a standalone process, migrated without any modification.

Worker queues and cron jobs consolidated onto smaller Hetzner instances. Jobs that previously ran on dedicated EC2 instances were containerised and scheduled on shared infrastructure, reducing the total instance count from 8 to 3 for background processing.

Staging and development environments moved entirely off AWS. These environments do not need high availability, managed databases, or premium networking. They need affordable compute that developers can spin up and tear down quickly. The cost reduction on staging environments alone was 75%.

CDN and edge caching moved to Cloudflare, which the client was already using for DNS and WAF. By enabling Cloudflare's caching and edge compute features that were already included in their plan, we eliminated a separate CDN cost on AWS CloudFront.

Curious what your margin opportunity looks like?

Free Margin Audit — 2 min

Free Tool

How much margin are you leaving on the table?

Answer 6 questions. Get a personalised margin estimate in under 2 minutes.

Take the Free Margin Audit

Zero-Downtime Migration: The CDC Approach

The hardest part of any migration is the cutover. The client could not afford any downtime. Every minute of downtime has a direct revenue cost (calculable from average revenue per minute), plus a longer-term cost from abandoned carts, broken marketplace seller feeds, and customer trust erosion.

We used a Change Data Capture (CDC) approach to ensure zero-downtime migration for stateful services.

Phase 1: Parallel operation. The new infrastructure was stood up alongside the existing AWS infrastructure. Application servers on the new provider were configured identically to the AWS instances and connected to the same Aurora databases via secure VPN tunnels. For two weeks, we ran both environments in parallel, routing a percentage of traffic to the new infrastructure via weighted DNS records.

Phase 2: Data replication. For services with local state (caching layers, session stores, queue backlogs), we set up CDC pipelines that replicated changes in real-time from the AWS instances to the new infrastructure. Debezium captured changes from MySQL binlogs and streamed them to the destination. Redis replication handled session state. The queue backlog was drained naturally by running consumers on both sides.

Phase 3: Traffic shifting. Over a 72-hour window, we shifted traffic from 10% on new infrastructure to 50%, then 90%, then 100%. At each step, we monitored response times, error rates, conversion rates, and checkout completion rates. The monitoring was automated: if any metric deviated by more than one standard deviation from its trailing average, traffic would automatically route back to AWS.

Phase 4: Decommission. After one week of stable operation on the new infrastructure with no fallback triggers, we decommissioned the AWS application servers, caching layers, and worker instances. The AWS account retained only the Aurora databases, the Lambda functions, and the integration endpoints that required AWS-specific features.

Total downtime during migration: zero seconds. The traffic shifting was invisible to customers. The marketplace seller feeds continued without interruption. The pricing engine's 15-minute monitoring cycle ran continuously throughout.

Database Optimization: The Hidden Savings

The cloud migration delivered £276K in annualized savings — from £38K/month to £15K/month within three months. But the infrastructure audit also uncovered significant database optimization opportunities that compounded the value.

Across five production databases totalling 465GB, we identified 257GB of recoverable space. This was a mix of orphaned tables from decommissioned features, over-indexed columns, uncompressed historical data, and log tables that had been growing unbounded for years. Recovering this space allowed us to downsize two of the Aurora instances to smaller tiers, saving an additional £18K annually.

More impactful than the space recovery was the index optimization. Six index fixes on the primary database delivered query speedups ranging from 1,200x to 59,000x, saving 12.2 CPU days of processing per month. The worst offender was a product catalogue query that ran during every category page load — it was performing a full table scan on a 14-million-row table because the composite index did not match the query's WHERE clause order. Adding the correct index reduced the query from 8.4 seconds to 0.14 milliseconds.

These optimizations did not require a cloud migration. They required someone to actually look at the database. The migration audit created the forcing function, but the savings would have been available regardless. If you have not audited your database performance in the last 12 months, there are almost certainly significant optimizations waiting to be found. We cover the full methodology in our 59,000x query speedup deep-dive.

When AWS Makes Sense

This article could be read as "AWS is too expensive, leave AWS." That is not the message. AWS is too expensive for workloads that do not use its differentiating features. For workloads that do, it remains the right choice.

AWS makes sense when: You need managed database failover and recovery (Aurora). You need serverless compute for event-driven workloads (Lambda). You need deep integration with AWS-specific services (SQS, SNS, Kinesis). You need auto-scaling for genuinely unpredictable traffic patterns. You are running machine learning training jobs on GPU instances with spot pricing.

AWS does not make sense when: You are running predictable workloads on fixed-size instances. You are using EC2 as an expensive Linux server. Your staging environments run 24/7 on production-sized instances. You are paying for managed services you could run yourself with Docker and a competent DevOps setup. Your traffic patterns are predictable enough that auto-scaling never triggers.

Most businesses fall into the second category for 60-80% of their workloads. The AWS bill is padded not by services that genuinely need AWS, but by services that were deployed there because it was convenient and never reassessed.

The Numbers

The tech cost reduction workstream of the transformation programme includes 13 initiatives worth £601K in annual value. The cloud migration contributed £276K of that — the single largest initiative in the workstream.

Combined with the database optimizations (£18K in downsized instance costs plus immeasurable performance improvements), the vendor replacements (£120K in eliminated SaaS costs), and the broader tool consolidation, the infrastructure became dramatically leaner without any reduction in capability, reliability, or performance.

The platform still processes 2 million requests per day. Response times improved by 12% on average due to the database optimizations. The marketplace still operates at full scale. The pricing engine still monitors every 15 minutes. The only thing that changed was the bill.

If your AWS bill has not been audited workload-by-workload in the last 12 months, you are almost certainly overpaying. The question is not whether savings exist. It is how large they are and how quickly you can capture them.

Want results like these? Book a free margin audit.

We'll audit your cloud infrastructure, identify overspend, and map out a migration plan with zero downtime and a clear ROI timeline.

Explore Cloud Cost Reduction → Book a Call

Cloud Migration Without Downtime: How We Cut Hosting Costs by 60%

The AWS Default

The Audit: What Actually Needs AWS

The Destination: Multi-Cloud Architecture

Zero-Downtime Migration: The CDC Approach

Database Optimization: The Hidden Savings

When AWS Makes Sense

The Numbers

Cloud Cost Reduction

Read Next

Database Optimization: 59,000x Speedup

The E-commerce Tech Stack Costing You Margin

Want results like these? Book a free margin audit.

Want results like these?