2024-2026
KARL
In production at Orange Business, for product and sales teams
RAG cloud-intelligence chatbot in production at Orange Business. Local multi-LLM integration (Llama 3.3 70B, DeepSeek R1, QwQ 32B) via vLLM on H100 NVL and L40S GPUs, LangChain + ChromaDB orchestration. Built for auditable answers, not for the demo.
KARL is the RAG cloud-intelligence chatbot I built end to end inside Orange Business's Cloud Avenue product team. It runs a local multi-LLM integration (Llama 3.3 70B, DeepSeek R1, QwQ 32B) served via vLLM on H100 NVL and L40S GPUs, with LangChain orchestration and a ChromaDB vector store. The goal was never a demo that impresses: it was answers grounded in internal sources, auditable and reliable, for product and sales teams.
Challenges
- Serving multiple LLMs locally on GPUs (H100 NVL, L40S) via vLLM
- Getting auditable, reliable answers rather than ones that just look good in a demo
- Orchestrating ChromaDB vector search with LangChain over internal cloud sources
Solutions
- Local multi-LLM integration (Llama 3.3 70B, DeepSeek R1, QwQ 32B) served via vLLM
- LangChain + ChromaDB RAG pipeline to ground answers in internal sources
- Evaluating outputs against real cases instead of trusting vibes
Results
- Deployed in production for Orange Business product and sales teams
- Local multi-LLM inference on H100 NVL and L40S GPUs via vLLM
- Grounded, auditable answers via RAG (LangChain + ChromaDB)
Technologies
LangChain · ChromaDB · vLLM · H100 NVL · Llama 3.3 70B · DeepSeek R1 · RAG · Python