Case Study

Agentic Supply Chain Intelligence Platform

Most recent greenfield build: an agentic supply chain intelligence platform combining contract performance with utilization analytics across client networks. In production. The shape of the work matters more than the marketing, so this page walks through architecture, production patterns, and one architectural call I made late in the build that I would happily talk through in any interview.

Hire Me

What It Does

What the Platform Does

The platform ingests purchasing data and contract documents from client organizations, surfaces variance against contracted pricing, scores compliance against committed spend and standardization targets, and flags risk before it becomes a write-off.

Inputs

PO line-item data via CSV upload. Contract documents via PDF upload.

Outputs

Ranked variance alerts, contract-level compliance scoring, portfolio-level risk visibility, and vendor performance tracking.

Stack

Architecture

Python FastAPI service with PostgreSQL as system of record (Liquibase migrations). vLLM and LlamaCPP behind a Model Manager that loads, unloads, and switches between open-weight models at runtime. Qwen3.6-27B as the primary model, lightweight alternates available for cheaper inference where the task allows.

Document ingestion uses Apache PDFBox for digital PDFs. When a contract turns out to be a scanned image (and a lot of them do), the platform falls back to AWS Textract automatically. SHA-256 hash-based dedup prevents the same contract from being processed twice.

The Architectural Call

The Late-Stage Architectural Consolidation

The platform started as a hybrid: Java Spring Boot for the business service layer (REST controllers, async processing, batch variance, contract upload), Python FastAPI for the AI service (inference, contract extraction, variance analysis). The call chain was UI to Java to Python to vLLM.

Late in the build I made the call to remove the Java layer entirely. The Java service was not earning its keep - it was adding a network hop, a serialization step, and a separate deployment lifecycle for what amounted to a thin RPC wrapper around Python. A consolidated Python-only stack was both faster and simpler to operate.

The team converted the platform to Python-only over a few weeks and demoed the consolidated version for release.

Two things worth calling out about that decision. First: it was a production system. Re-architecting a production system is not a default move and I took the call with care. Second: the right architectural decision in week one is not always the right architectural decision in week twenty. Earning the right to revise an earlier decision when the evidence is in front of you is part of the work.

In Production

Production Patterns

Two-Pass Contract Extraction with Confidence Scoring
Contracts get extracted in two passes. Pass one pulls metadata: vendor, dates, spend commitments, termination terms, renewal terms. Pass two pulls pricing line items, page by page.
Every extracted field gets a confidence score from 0 to 100. Fields below 70 percent get auto-flagged for human review. This is the production pattern that lets AI-driven extraction work in a regulated workload - the platform is honest about what it is uncertain about, and a human checks the uncertain ones.
Compliance Scoring Engine
For every active contract, the platform calculates actual spend to date against committed annual spend using live PO data, full-year projection based on current run rate, standardization compliance (the vendor's share of total category spend), and penalty exposure from the termination clause.
Those inputs feed a risk classifier that bins each contract as Critical, High, Medium, or Low. Every PO CSV upload triggers an automatic re-score of all active contracts. Same-day scores replace rather than duplicate.
Vendor Matching
Vendor names rarely match cleanly across PO data and contract data. The platform uses fuzzy text matching (Levenshtein distance) to resolve obvious variations and AI-assisted semantic matching for the ambiguous cases. Confirmed matches get cached in an aliases table so the same fuzzy resolution does not run twice.
Unit-of-Measure Conversion
Four-tier fallback: item-specific learned conversions, item-specific manual overrides, organization-wide defaults, system-wide defaults. The system also detects suspicious multipliers - boxes and cases recorded with the same price ratio, for example - and flags them as likely data entry problems.

Targets

Performance Discipline

Targets enforced in CI and monitored in production:

Contract upload to acknowledgment: under 10 seconds.
Single-contract compliance scoring: under 3 seconds.
Portfolio-level aggregation: under 2 seconds.

Leadership

What Leadership on This Build Looked Like

I owned the architecture (including the consolidation call), the production-grade design patterns (two-pass extraction with confidence scoring, OCR fallback, compliance scoring engine), the performance discipline, and the FY26 gap analysis that produced the three-phase roadmap for what the platform becomes next.

Player-coach in practice on this build meant PR review on the architecture-sensitive merges, design review on the consolidation, partnership with the CPO on roadmap, and the conversations with the team that converted a heavyweight call (rip out Java) into work the engineers wanted to do.

Hire Me