Part 1 of 7

The Monolith Whisperer

Why we built an AI agent team for cloud modernization — and why the first agent just listens

ClaraYet Team 8 min read View Full Series
Wave orchestration diagram

Every legacy modernization starts with the same lie: "We know what we have."

You don't. The architect who built the system left years ago. The documentation describes a version that hasn't existed since 2019. The team knows their corner — the billing module, the auth layer, the batch jobs — but nobody holds the full picture.

Traditional assessments rely on institutional knowledge that has already evaporated. A consulting team interviews stakeholders, reads whatever docs exist, and produces a deck built on incomplete information. We wanted a different approach: an AI system that reads the actual code — every file, every dependency, every configuration — and produces an evidence-based modernization blueprint.

So we built one. A Claude Code skill that orchestrates a team of specialized AI agents, each focused on one phase of the modernization analysis.

Throughout this series, we'll follow three applications through the full analysis pipeline: a ColdFusion CMS (~1,800 files, custom ORM, 15 years of organic growth), a Java e-commerce platform (Spring Boot 2.3, 45K files, Oracle database, tightly coupled modules), and a .NET enterprise portal (ASP.NET MVC on .NET Framework 4.8, SQL Server, Windows-only dependencies). You'll see what each agent actually produces — not hypotheticals, but real outputs from real codebases. Each app surfaces different challenges, and watching all three unfold shows how the agents adapt their analysis to what they find.

Every legacy application has its own story. These three are ours — but the approach and tools are designed to work against any stack, and we'll show you how to tailor them for yours.

A note on cloud provider: this series uses AWS throughout — the service mappings, the MCP integrations, the CDK constructs, the pricing API. That's our default because it's what most of our engagements target. But the analysis framework — the agent team, the wave structure, the evidence-based approach — is provider-agnostic. The scanner, decomposer, and migration strategist don't care whether the target is AWS, Azure, Google Cloud, or Oracle Cloud. The solution architect and cost analyzer are where provider-specific knowledge lives, and those agents can be swapped or extended for any CSP. If your destination is Azure, the bounded contexts are the same — but the architect recommends AKS instead of ECS, Cosmos DB instead of DynamoDB, and Azure Front Door instead of CloudFront.

The agentic approach

A good consulting engagement doesn't send one generalist. It assembles a team: a discovery analyst, a domain architect, a solutions architect, a security specialist, a cost modeler, a migration planner. Each has deep expertise in their lane. They work in sequence and in parallel, building on each other's outputs.

Our system mirrors this. Six specialized agents, orchestrated in dependency waves:

Wave 1: Codebase Scanner (reads everything, produces inventory)
Wave 2: Domain Decomposer (identifies bounded contexts, designs extraction plan)
Wave 3: Solution Architect + Security Architect (parallel)
Wave 4: Observability Architect + Cost Analyzer (parallel)
Wave 5: Migration Strategist (synthesizes everything into roadmap)

!Agent Wave Orchestration — five waves of specialized agents, with parallel execution in Waves 3 and 4

Each agent is a markdown file — a structured document describing the agent's expertise, tools, inputs, and output format. No Python framework, no orchestration engine. Just prose that Claude Code follows.

The orchestrator (SKILL.md) doesn't do analysis. It creates the team, manages dependencies, and assembles the final report. Project manager, not practitioner.

Why agents instead of one big prompt?

We tried the single-prompt approach first. "Analyze this codebase and produce a modernization plan." The results were surface-level — generic recommendations that could apply to any application. "Consider using microservices." "Evaluate serverless options." Not useful.

The problem is context window economics. A 100K-line codebase doesn't fit in a single prompt. Even if it did, asking one agent to be simultaneously expert in code archaeology, DDD, AWS architecture, security auditing, cost modeling, and migration planning produces mediocre results across the board.

Specialization works for the same reason it works in human teams. A security architect who spends all day thinking about IAM policies and encryption catches things a generalist misses. A cost analyst who queries real pricing APIs produces numbers a generalist estimates.

The agent-per-phase approach also gives you parallelism — the solution architect and security architect work simultaneously — and replayability, so you can re-run just the security agent if it produces a questionable report. Every recommendation has traceability: it traces back through the document chain to a specific finding in the codebase.

Agent 1: The codebase scanner

The first agent is the foundation everything else builds on. If it misses something, every downstream agent inherits the blind spot. Its job: read the codebase and tell us what's there.

  • Technology stack — not just "it's Java" but "Java 8 on Spring Boot 2.3 with Hibernate 5, built with Maven, running on Tomcat." Not just "it's ColdFusion" but "ColdFusion on Lucee with a custom ORM called FourQ, supporting MSSQL, MySQL, and H2 through hand-rolled gateway classes."
  • Architecture analysis — entry points, module boundaries, communication protocols. Monolith? Modular monolith? Distributed monolith? The agent traces the actual dependency graph, not the intended one.
  • Database audit — connections, ORM mappings, stored procedures, distributed transaction anti-patterns. If there's an XA DataSource or JTA transaction manager, the agent flags it — those add operational complexity and latency in cloud-native environments and are typically replaced with the Saga pattern.
  • State management — sticky sessions, in-process caches, shared filesystems. Every one is a blocker that needs to be addressed before effective horizontal scaling.
  • Integration inventory — REST, SOAP, message queues, file transfers, batch scheduling. The agent maps every external touchpoint because each one becomes a migration dependency.
  • 7 R's classification — Retire, Retain, Rehost, Relocate, Repurchase, Replatform, or Refactor for each module, with evidence-based justification.

The output is a structured markdown report — typically 300-500 lines — with tables, file paths, line numbers, and counts. "727 direct SQL queries across 50 files" is useful. "The application uses SQL" is not.

The scanner is read-only — it cannot modify the codebase. disallowedTools: Write, Edit in the agent definition. This isn't just safety; it's trust. When scanning a production codebase, the team needs absolute confidence the tool won't change anything. It also forces the agent to be an observer, not a fixer — keeping the discovery phase clean and objective.

Making it language-agnostic

The scanner needs to work against any legacy stack. The trick is giving the agent concrete patterns to search for:

  • Java: pom.xml, @RestController, @Entity, jdbc:, @JmsListener
  • ColdFusion: Application.cfc, cfquery, cfhttp, cfmail, cfschedule
  • .NET: web.config, [HttpGet], DbContext, IMemoryCache
  • COBOL: EXEC CICS, EXEC SQL, COMP-3, VSAM file definitions
  • Node: package.json, express, sequelize, mongoose

The agent greps for these patterns, then reasons about what it finds. @Transactional + @Entity = Spring Boot with JPA. cfquery + Application.cfc = ColdFusion. EXEC CICS + COMP-3 = mainframe — switch to mainframe-specific analysis.

Pattern matching is the foundation, not a replacement for reasoning. "Find all integration points" produces vague results. "Grep for @JmsListener, cfhttp, EXEC CICS" produces an exact inventory — but the list alone isn't useful. The LLM reasons about what it finds: which integrations are critical path (called on every request), which are deprecated (referenced but unreachable), and which represent migration dependencies (external systems that need coordination). Finding @JmsListener produces a list of message consumers; understanding which are mission-critical requires tracing the event flow and identifying downstream effects.

Scaling to large codebases

The scanner adapts its depth based on codebase size. These thresholds are calibrated to context window limits and analysis quality — beyond ~10K files, full file-by-file analysis produces diminishing returns and risks overwhelming the agent's working memory:

File Count Mode Approach
< 10,000 (~50K–500K LOC) Standard Full analysis
10,000–50,000 (~500K–2M LOC) Sampling Largest files per language, highest-signal patterns
> 50,000 (2M+ LOC) Enterprise Structural analysis, recommend per-module scans

A ColdFusion CMS with 1,800 files gets the full treatment. A Java enterprise app with 45,000 files gets sampled analysis. For truly massive codebases, the agent recommends breaking into per-module runs.

Document-based agent communication

Agents don't call each other. They communicate through markdown files:

01-discovery.md      ← Scanner writes, everyone reads
02-decomposition.md  ← Decomposer writes, architects read
03-architecture.md   ← Solution architect writes
04-security.md       ← Security architect writes
05-cost.md           ← Cost analyzer writes
06-migration.md      ← Strategist reads ALL, writes final synthesis

This is the simplest possible integration pattern, with three properties that matter: human-readable at every stage, independently re-runnable (re-run just the security agent if needed), and auditable (trace any recommendation back through the document chain).

What the scanner found: two codebases, two stories

When we pointed the scanner at our ColdFusion CMS, the discovery report came back at 487 lines. It found a custom ORM called FourQ with runtime schema generation, 727 direct SQL queries across 50 files, sessions stored in application scope, and a hand-rolled S3 client with hardcoded access keys. The 7 R's classification: Retire 2 modules (legacy admin widgets), Replatform the container, Refactor 6 bounded contexts.

That was the easy one. The ColdFusion app was messy but legible — one codebase, one deployment, clear boundaries waiting to be drawn.

The Java e-commerce platform told a different story: 312 Spring Boot services that looked like microservices but shared a single Oracle database with 40+ cross-service joins. The scanner classified it as a distributed monolith — the worst of both worlds. Its discovery report recommended consolidating into 8 true bounded contexts before extracting to cloud-native services.

The .NET portal was a different beast — not chaotic, but locked to its platform.

The .NET enterprise portal surfaced a third pattern entirely: a well-structured ASP.NET MVC application with clean separation of concerns, but hard dependencies on Windows-specific features. NTLM authentication, Windows Task Scheduler for batch jobs, COM interop for PDF generation, and a SQL Server database using features like MERGE and CLR stored procedures with no PostgreSQL equivalent. The scanner's 7 R's classification was mostly Replatform — the architecture was sound, but the platform dependencies needed systematic replacement.

Three apps, three stories — same agent, same analysis framework, completely different outputs because the evidence was different. That's the point.

Patterns across codebases

After running this against ColdFusion monoliths, Java N-tier applications, and .NET systems, patterns emerge:

  • Session management is often a problem, particularly in monoliths without external session stores. In most legacy monoliths we've analyzed, sessions are stored in-process — blocking horizontal scaling from day one.
  • Test coverage is typically worse than expected. We've seen 5 test files for 122K lines of code. Even codebases with test directories often have low coverage or abandoned test suites.
  • Custom data access layers are common, particularly in ColdFusion and older Java applications. These hand-rolled ORMs encode business logic that isn't documented anywhere else — making them the hardest thing to migrate.
  • The 7 R's classification is rarely uniform. A single application typically has modules that should be Retired, Replatformed, and Refactored. The wave plan that emerges is the backbone of the migration.

Next: Part 2 — Drawing Lines in the Mud — where the domain decomposer applies DDD event storming to legacy code and identifies the bounded contexts that become your microservice boundaries. The evidence chains from the scanner — every file path, every dependency, every anti-pattern — become the raw material the decomposer validates against.

"Organizations don't need more AI tools —
they need a guide."

Ready for clarity?

Let's talk about where AI can make the biggest impact in your organization — and how to actually get there.

You haven't tried Clara. Yet.