2024-2026

KARL

In production at Orange Business, for product and sales teams

RAG cloud-intelligence chatbot in production at Orange Business. Local multi-LLM integration (Llama 3.3 70B, DeepSeek R1, QwQ 32B) via vLLM on H100 NVL and L40S GPUs, LangChain + ChromaDB orchestration. Built for auditable answers, not for the demo.

KARL is the RAG cloud-intelligence chatbot I built end to end inside Orange Business's Cloud Avenue product team. It runs a local multi-LLM integration (Llama 3.3 70B, DeepSeek R1, QwQ 32B) served via vLLM on H100 NVL and L40S GPUs, with LangChain orchestration and a ChromaDB vector store. The goal was never a demo that impresses: it was answers grounded in internal sources, auditable and reliable, for product and sales teams.

Challenges

Serving multiple LLMs locally on GPUs (H100 NVL, L40S) via vLLM
Getting auditable, reliable answers rather than ones that just look good in a demo
Orchestrating ChromaDB vector search with LangChain over internal cloud sources

Solutions

Local multi-LLM integration (Llama 3.3 70B, DeepSeek R1, QwQ 32B) served via vLLM
LangChain + ChromaDB RAG pipeline to ground answers in internal sources
Evaluating outputs against real cases instead of trusting vibes

Results

Deployed in production for Orange Business product and sales teams
Local multi-LLM inference on H100 NVL and L40S GPUs via vLLM
Grounded, auditable answers via RAG (LangChain + ChromaDB)

Technologies

LangChain · ChromaDB · vLLM · H100 NVL · Llama 3.3 70B · DeepSeek R1 · RAG · Python