Evolved a production chatbot from naive RAG to autonomous AI agent over 14 months — each architecture shift doubling answer quality.
New recruits must master a feature-rich HR & logistics portal. Sparse, outdated documentation generated many support tickets every month and slowed onboarding.
I designed and led delivery of a micro-service chat assistant that evolved through five architectural phases:
| Service | Purpose | Key Tech |
|---|---|---|
| Agent Core | ReAct agent loop — reasons about queries, autonomously retrieves data across multiple sources, synthesizes answers | FastAPI · Azure OpenAI (GPT-4o) |
| Retrieval | Vector search, structured action data, and database access — exposed as tools the agent invokes on demand | Azure AI Search · MCP |
| Ingestion | Parses → chunks → embeds content on each release | Azure AI Search · langchain |
| Evaluation | LLM-based quality checks on every PR | RAGAS · pytest |
| Web UI | Responsive chat + feedback panel | React · Tailwind |
Architecture evolution: GPT-3.5 with chunks in system prompt (Q1 2024) → RAG pipeline with HyDE, reranking and retrieval tricks (mid 2024) → GPT-4 upgrade, biggest single quality jump (Q3 2024) → Lazy Graph RAG experiment (late 2025) → ReAct agent architecture replacing the fixed pipeline (early 2026).
Security: Private VNets, sealed storage; passed Swiss MoD pentest and audit. Azure OpenAI hosted on Swiss servers per client requirement.
Architected the end-to-end system across four major architecture evolutions; built the chat service, ingestion pipeline, evaluation framework, and ReAct agent core; evaluated and discarded multiple RAG strategies (HyDE, reranking, Lazy Graph RAG) based on measured impact; mentored junior engineers on RAG and agent patterns; measured success metrics continuously via RAGAS
| Challenge | Mitigation & Result |
|---|---|
| Sparse & outdated docs | Curated 30 high-impact UI walkthroughs + generated 715 structured actions via UI-tree pipeline → +40 pp accuracy uplift |
| GPT-3.5 hallucinations & small context | Feature-toggle architecture → zero-downtime model upgrades as Azure released GPT-4 and GPT-4o |
| Diminishing returns from RAG tricks | Replaced fixed retrieval pipeline with ReAct agent loop → agent reasons about its own search strategy, adapting in real time |
| Measuring answer quality | RAGAS suite in CI → PR feedback < 5 min; evaluation survived every architecture change |
