Migrating a Console Nobody Fully Understood

← Back to selected work

When I joined the AWS Migration Hub team, the codebase was running on infrastructure that predated all of the engineers maintaining it. The original authors were long gone. On-call rotations surfaced issue after issue that puzzled everyone.

Migrating it was tracked all the way up to AWS leadership. Which meant: no fallback, no quiet failure, no “we’ll get to it next quarter.”

The System

Migration Hub helps enterprises move their infrastructure to AWS. Customers like Tesla, BMW, and Deloitte used the console to track migration progress across hundreds of servers and applications. The frontend was built on a legacy Java-based stack with layers of libraries and conventions that no one on the team had chosen.

The pain was real but diffuse. Not a single catastrophic problem but a steady tax on everything. Debugging took longer than it should. On-call issues were unpredictable. Every sprint lost time to friction managing this system.

The migration would move the entire frontend to native AWS services: modern React, CDK-managed infrastructure, a new design system. On paper, a straightforward modernization. In practice, an 18-month program touching design, security, product, and platform engineering.

The Approach Nobody Chose

The first approach we evaluated was a gradual migration: split the backend in two, keep the legacy Java system running alongside the new PaaS infrastructure, and port features over incrementally. Lower risk because the existing system stays live while you build the new one in parallel.

But the scope would have been significantly larger. Maintaining two systems in parallel means double the operational burden, double the on-call surface, and a long tail of integration work to keep them in sync. We’d be reducing risk by stretching the timeline but trading one kind of pain for another.

I pushed for a single migration in one coordinated switch. Higher risk but a clean break from the legacy system and a tighter timeline. That meant the program needed to be managed carefully, because there was no fallback.

The Stakeholders

The migration included more than 15 stakeholders, each with different priorities:

Design saw an opportunity to adopt a new design system that would bring consistency across the console and would enable new front-end features.

Application Security saw every architecture change as a potential vulnerability. Their default was caution, and their reviews could delay launches by weeks.

Product had customer commitments and a roadmap to protect. Every sprint spent on migration was a sprint not spent on features. They needed reassurance that migration would accelerate delivery, not stall it.

Platform engineering owned the infrastructure we were migrating to. They had their own priorities and timelines. Our migration depended on their services being ready, which was a dependency we didn’t fully control.

Nobody was wrong. Four groups but zero shared plan.

Owning the Program

I wasn’t hired as a program manager, I was a software engineer. But the migration needed someone tracking dependencies across teams and projects, maintaining a program tracker with more than 100 items, running weekly syncs, and translating between groups that each had their own track within the same project.

So I stepped into that role. Held weekly status syncs. Built the go/no-go criteria for launch. Facilitated three launch readiness reviews with senior leadership.

Running the program while doing full re-writes in the same sprint was brutal.

Working with AppSec

The standard process for AppSec reviews is heavy: engineering builds, security reviews, security finds problems, timelines slip.

I showed AppSec engineers what we were eliminating: legacy systems with known vulnerabilities, outdated dependencies, infrastructure that hadn’t been patched according to then-current standards. The migration was a security win.

The security lead could show their leadership that this migration was reducing risk. Their incentive flipped from “slow this down” to “help this succeed.”

The result: they fast-tracked our reviews. We launched with zero security-related blockers.

Fixing unforeseen blockers

Six weeks before our planned launch, I was reviewing the technical specification and noticed something that didn’t fit. The new architecture required the frontend to call services that were currently only accessible from internal networks. Nobody had flagged this because that system hasn’t been touched in years.

We were heading for weeks of delay that nobody would discover until launch day.

Together with the team owning this dependency, I mapped it, quantified the risk, and brought a mitigation plan to our TPM partner. We negotiated a two-week timeline adjustment with the service owners and launched on the revised date.

The Outcome

The migration shipped on the revised timeline. The numbers:

$300K/year in infrastructure cost savings through architecture simplification
50% reduction in on-call incidents through a more modern infrastructure, better runbooks, and cross-team knowledge sharing
Zero security-related launch blockers
New design system adopted with zero customer-facing downtime through a phased rollout

But the outcome that mattered most doesn’t have a number: the team could move faster. New features that used to require fighting the legacy architecture could now be built with a clean starting point. The React components built during the migration would be reused six months later when the team pivoted to GenAI integration with AWS Transform.

What I’d Do Differently

I’d push harder for customer connection from the start.

Throughout the migration, I advocated for customer interviews, for a dashboard showing business metrics alongside operational metrics, for getting closer to how customers actually used the product. The culture didn’t support it. Migration Hub was operated as an infrastructure tool, not a product with users to understand.

This gap caught up with us. We built a network visualization for migration progress. Usage was disappointing. When I finally got on a call with an affected customer, the reality was worse than metrics suggested. The entire workflow was unusable for complex migrations. Customers didn’t want to explore their data. They wanted answers.

That insight resurfaced when the team later integrated GenAI: the conversational interface we built got right what the dashboard got wrong. But we could have learned that lesson a year earlier if we’d been talking to customers from the beginning.

The migration was a technical success. But a migration that also improved the product would have been worth significantly more.

The Lesson

TPM work is essential for ambitious projects like this. Otherwise you can’t create the conditions where 15+ stakeholders, each with competing priorities, could build a shared plan and execute it.

That meant:

Catching the network access gap because I was the only person reading both the frontend and backend specs. Credibility came from knowing the system, not from a title
Building a scorecard that turned AppSec from a launch gate into an ally with skin in the game
Maintaining a 100+ item tracker while shipping React components in the same sprint

← Back to selected work