Technical Deep Dive: Modernizing Java Pet Store With CoreStory

We ran CoreStory’s six-phase Code Modernization Playbook against Sun Microsystems’ Java Pet Store 1.3.2 — a J2EE e-commerce application with ~276 classes, EJB 2.x entity beans, JMS messaging, a Swing admin client, and zero test coverage — and produced a fully working Spring Boot 3.4.1 system: four deployable services (sharing a common domain library), eight modules, 117 behavioral tests with zero failures, and live validation against 22 REST endpoints.

This post walks through how we got there, what decisions the playbook drove, and where CoreStory’s code intelligence made a material difference versus where a competent architect would have reached the same conclusions.

Why Pet Store?

Java Pet Store is a reference application, not an enterprise system. We chose it because its J2EE patterns are real — EJBs, message-driven beans, JMS queues and topics, JNDI lookups, XML deployment descriptors — and because engineers can inspect and verify every claim in this post against the actual source code.

The Pet Store’s clean boundaries are partly an artifact of its nature as a reference app. Real enterprise systems rarely decompose this cleanly, and the assessment phase becomes correspondingly more valuable when the codebase is messier. We’re demonstrating the playbook’s patterns on a codebase of comprehensible size, not claiming that a 276-class migration proves anything about 10,000-class systems.

With that framing in mind, here’s what the playbook actually does.

The six-phase playbook

The Code Modernization Playbook runs in six phases: Assessment, Business Rule Extraction, Architecture Decision, Decomposition, Execution, and Verification. The first three are where CoreStory’s code intelligence does the heavy lifting. The last three are where engineering takes over, informed by what the first three produced.

Phase 1: Assessment

CoreStory ingests the codebase, builds a persistent intelligence model, and returns a structured assessment: components, technologies, coupling levels, communication patterns, and a readiness score.

The Pet Store received a readiness score of 2.2 out of 5, which is low, due to zero test coverage, hardcoded credentials, plain-text passwords, and a God class (PetstoreComponentManager). But three structural findings underneath that headline number were more important than the score itself.

First, no circular dependencies. The dependency graph was cleanly layered. Second, no shared database tables — each EJB exclusively owned its tables. This is the single biggest factor in whether you can extract services without a painful database decomposition phase. Third, OPC and Supplier already communicated with the Storefront via JMS. Purchase orders flowed over queues (point-to-point). Invoices flowed over topics (pub/sub, multiple consumers). This wasn’t a theoretical service boundary; it was one the original architects had drawn.

That third finding is a sophisticated finding for an AI agent. Confirming that a boundary exists is straightforward: you find the JMS queue declarations. Confirming that the boundary is clean — no shared tables, no direct EJB calls bypassing the messaging layer, no backdoor coupling — requires checking every integration point. You have to verify the absence of coupling, which is harder than finding the presence of it. For 276 classes, a senior architect could do this in a couple of days by reading the code. For a system ten times that size, this is where persistent code intelligence goes from a timesaver to a critical layer of infrastructure.

Phase 2: Business rule extraction

We queried CoreStory across four domains — order processing, customer management, catalog/cart/inventory, and admin/cross-cutting — and extracted 95 business rules with exact class and method locations. Each rule got a BR- ID (e.g., BR-ORD-008) that would later appear in work package acceptance criteria and test names.

Ten rules were flagged as critical modernization risks. The two most architecturally significant:

Locale-specific auto-approval thresholds. Orders below $500 (US) or ¥50,000 (Japanese locale) are auto-approved. Above those thresholds, orders require admin approval. These values were hardcoded in PurchaseOrderMDB.canIApprove(). CoreStory identified the exact method, the exact thresholds, and the fact that locale determines which threshold applies.

The order lifecycle state machine. PENDING → APPROVED → SHIPPED_PART → COMPLETED (or PENDING → DENIED). This state machine spans six message-driven beans across two modules: PurchaseOrderMDB hands off to OrderApprovalMDB, which hands off to SupplierOrderMDB, which hands off to InvoiceMDB. CoreStory traced this by following message flow across MDBs — understanding the architecture as a connected system rather than individual files.

The extraction was done per-domain, which intentionally creates overlap. “Check inventory before shipping” appears as both an order processing rule and an inventory constraint. This duplication is a feature: it ensures each team’s migration preserves their view of shared behavior, and cross-domain test coverage confirms both sides of the JMS boundary agree.

Phase 3: Architecture decision

Three options evaluated:

Option A: Modular Monolith. Replatform everything to Spring Boot as a single deployable. 18–25 developer-weeks. Lowest risk, simplest, but limited future flexibility.

Option B: Full Microservices. Decompose into 6+ independent services. 35–50 developer-weeks. Most impressive on paper. Highest risk given zero test coverage.

Option C: Hybrid. Replatform the tightly coupled core (Storefront, Catalog, Customer, Cart, Auth) as a Spring Boot monolith. Extract OPC and Supplier as independent services along the JMS boundary identified in Phase 1. 25–32 developer-weeks.

Selected: Option C. The rationale was specific to the code findings, not general principles.

The core Storefront components — ShoppingController, ShoppingClientFacade, ShoppingCart, Customer/Account — share session state and orchestration logic. Forcing these into separate microservices would mean rearchitecting the coupling, not just migrating it. With zero test coverage and a readiness score of 2.2, that’s high-risk rework with no safety net.

But OPC and Supplier were different. They communicated with the Storefront only via JMS. No shared tables. No direct EJB calls. The boundary existed. Extracting them as independent Spring Boot services with @JmsListener replacing the MDBs followed the natural grain of the architecture.

The most valuable thing an architecture analysis can produce isn’t the most impressive recommendation. It’s the specific finding in the code that makes the right option clear. Here, that finding was the JMS boundary combined with the absence of shared tables — two facts that pointed directly to Option C.

The OPC state machine: the hardest piece

Work Package 7 replaced six legacy MDBs plus ProcessManagerEJB and OPCAdminFacadeEJB with five @JmsListener components and a unified OrderWorkflowService. This is where the migration got interesting.

The state machine:

PENDING ── auto-approve (<$500 US / <¥50K JP) ──→ APPROVED
   │                                                   │
   ├── admin denies ──→ DENIED                         │
                                                       ↓
APPROVED ── supplier fulfills ──→ SHIPPED_PART ──→ COMPLETED

The approval thresholds that were hardcoded in the MDB are now externalized:

opc:
  approval:
    us-threshold: 500.00
    jp-threshold: 50000.00

This is a concrete modernization improvement — configuration that was buried in a Java method is now in application.yml, externally configurable without recompilation.

The JMS topology was preserved: purchase orders use queues (point-to-point), invoices use topics (pub/sub). This required separate JMS container factories for queue vs. topic listeners — a detail that’s easy to get wrong and that would cause silent message loss if misconfigured.

The approval flow is async, matching the legacy MDB-to-MDB handoff pattern. Auto-approved orders send to the orderApproval queue, and OrderApprovalListener processes the transition. This matters because a synchronous implementation would have changed the system’s concurrency behavior — orders that previously processed in parallel would serialize.

One bug was caught during adversarial review: processInvoice() originally lacked a status guard. Only orders in APPROVED or SHIPPED_PART status should accept invoices, but the initial implementation would process invoices for any order. The fix was a status check, verified by a dedicated test. This is the kind of edge case that slips through when you’re focused on the happy path and the legacy code’s error handling is implicit in MDB message selector behavior rather than explicit in application logic.

The traceability chain

The BR-ID system created a verifiable chain from extraction through to testing. Here’s how it works for a specific rule:

BR-ORD-008 in the business rules inventory identifies the auto-approval threshold. That BR-ID appears in the acceptance criteria for Work Package 7. In the test suite, BrOrd008_AutoApproveBelowThreshold asserts the specific behavior:

@Test
@DisplayName("Auto-approve below US threshold sends to approval queue")
void autoApproveBelowUsThreshold() {
    PurchaseOrder order = createOrder("en_US", new BigDecimal("58.50"));
    workflowService.processNewOrder(order);
    assertThat(order.getStatus()).isEqualTo(OrderStatus.PENDING);
    verify(jmsTemplate).convertAndSend(
        eq("petstore.opc.orderApproval"), any(Map.class));
}

The $58.50 value comes from the Tailless Manx cat in the seed data — a specific item used in live endpoint validation to confirm the modernized system matches the legacy data. The $500 threshold comes from PurchaseOrderMDB.canIApprove() in the legacy code.

This isn’t naming convention; each link carries specific, verifiable content. A stakeholder can trace from “is the approval threshold preserved?” to the exact test that asserts the exact value. For regulated environments where auditors need to confirm specific business rules survived a migration, this chain is the deliverable.

Coverage breakdown

Of the 95 business rules: 62 have dedicated test methods verifying preserved or improved behavior, 8 are verified through cross-domain overlap (same behavior tested under a different domain’s BR-ID), 7 are framework-implicit (J2EE patterns like JNDI and ServiceLocator that Spring replaces architecturally), 14 are not applicable (XML schemas superseded by JSON, Swing components replaced by Chart.js, event dispatch patterns replaced by direct calls), 3 are legacy behaviors intentionally improved in the modernized system, and 1 is a genuine gap — reporting backend aggregation, deprioritized because it doesn’t affect order processing correctness.

The 117 tests break down by module: Catalog (16), Customer/Auth (31), Shopping Cart (14), Web Tier (15 — 7 functional + 8 security), OPC Service (22), Supplier Service (11), Admin UI (8).

Where CoreStory made the difference

Not every phase of this project benefited equally from code intelligence. Being specific about where the value was strongest — and where a competent team could have reached similar conclusions without it — matters for anyone evaluating this approach.

The JMS boundary finding was the highest-value contribution. The specific discovery that OPC and Supplier communicate with Storefront only via JMS — combined with the confirmed absence of shared tables and direct EJB calls — is what made the hybrid architecture the clear choice. The positive finding (JMS exists) is easy enough to spot manually. The negative finding (nothing else couples these components) requires checking every integration point systematically. That’s where code intelligence tools earn their keep, especially at scale.

The business rule traceability chain delivers real process value. The BR-ID system works as demonstrated — from inventory to acceptance criteria to test names to assertions. Its value is strongest in environments where stakeholders or auditors need to confirm specific rules are preserved. For a team that doesn’t face that kind of scrutiny, the traceability is nice to have but not essential.

CoreStory's business rule extraction is valuable for its depth, not just its breadth. CoreStory didn’t just say “there’s approval logic in the order module.” It identified the exact method, the exact thresholds, and the locale dependency. A human reading the code would find these eventually. CoreStory found them by querying a structured model of the entire codebase. For 276 classes, that’s faster. For thousands of classes, it’s the difference between a complete inventory and a best-effort list.

Work package sequencing is a valuable but undifferentiated contribution. “Build the foundation first, then the lowest-coupling module, then work up the dependency graph” is what any competent architect would recommend. CoreStory’s dependency data informed the specifics, but the pattern is standard practice.

Known limitations

A few things to note about what this demo doesn’t prove:

The legacy J2EE application requires the Sun J2EE Reference Implementation server from 2003, which is effectively extinct. We can’t run a true side-by-side comparison. Parity is established through seed data from the legacy system, behavioral tests against the extracted thresholds and defaults, and live endpoint validation — not runtime comparison.‍
MailNotificationListener logs notifications rather than sending email — a conscious scope limitation. The conditional notification logic is fully implemented and tested, but there’s no SMTP integration.
JMS producer destinations are hardcoded strings while consumers use property placeholders. Production hardening would externalize both sides. SecurityConfig uses permitAll as the default, and the Admin UI uses hardcoded credentials. Both are standard demo choices, not architectural decisions.
Where the legacy system had gaps — buggy credit card validation, no server-side category constraint enforcement, no role-based access control, unvalidated admin status updates — the modernized system implements correct behavior rather than reproducing the legacy bugs. These are documented as intentional improvements, not claimed as preserved legacy rules.
CoreStory did not write the migration code — a separate AI coding tool handled implementation. CoreStory did not make the architecture decision — it presented evidence and the team made the call. CoreStory did not replace engineering judgment. The choice to use embedded Artemis for standalone testing, BCrypt instead of legacy plain-text passwords, separate queue and topic JMS factories — these were engineering decisions informed by CoreStory’s data but not made by it.

Patterns that transfer

This demo demonstrates the patterns for using CoreStory for modernization. These patterns are what matter for real-world application.

Evidence-based architecture selection works at any scale. The specific mechanism — mapping communication patterns, verifying data ownership, confirming the absence of hidden coupling — scales better with code intelligence than with manual analysis. The larger and messier the codebase, the more valuable the negative findings become.

Business rule extraction with cross-domain coverage catches rules that single-perspective analysis misses. The intentional overlap — capturing “check inventory before shipping” as both an order processing rule and an inventory constraint — ensures each team’s migration preserves their view of shared behavior.

BR-ID traceability provides an audit trail from legacy code to modernized tests. The chain is mechanical to establish once the extraction is done, and its value compounds with the number of stakeholders who need confidence that the migration is correct.

Strangler fig decomposition along existing boundaries — cutting where the architecture already has seams rather than imposing new ones — reduces risk by preserving proven communication patterns. The JMS boundary in the Pet Store is one example. In enterprise systems, these boundaries might be message queues, API gateways, file-based integrations, or database replication channels. The principle is the same: find the boundary that already exists, verify it’s clean, and cut there.

The playbook compresses modernization planning from a multi-week manual effort into structured, queryable analysis. The code is open for inspection here. The claims are specific. If you work with legacy systems and want to evaluate CoreStory’s approach, this is a concrete starting point.