Agentic AI · GenAI · MLOps - treated as systems engineering.

Vavi Labs builds AI, ML workflows with the parts that matter in production: problem framing, architecture, evals, guardrails, observability, deployment paths, and team training. This site is a working portfolio of products, tools, models, and artifacts behind that approach.

The portfolio.

Products

  • Arbiter — build-time governance making every agent-assisted architecture call visible before it compounds.
  • BatSwing — vision-based batting analysis for cricket academies.
  • Creative Collab OS — AI-assisted creativity with better context, taste, and judgment.
Explore products →

Consulting

  • Case study — Agentic AI for Shipment Exception Management, a logistics & supply-chain control tower.
  • Reference implementation — 6-agent architecture, 7-layer guardrail stack, eval-first engineering, built on LangGraph, FastAPI & deployed on AWS.
Explore consulting → Read the whitepaper →

Plugins, Skills

  • Agentic AI Engineering — full lifecycle plugin: system design, eval harness, production deployment. /agentic-ai-engineering
  • Production MLOps — end-to-end workflows: experiment tracking, deployment pipelines, monitoring & alerting. /production-mlops
Browse dev-tools →

Courses & Training

View all courses → Browse training decks →

Fine-tuning

See training results →

Explainers

Illustrated explainers and deep dives with interactive visualisations.

Illustrated Coding Agent · Illustrated LLM Inference · Statistics for MLOps · Illustrated RLHF Guide

Read the series →

Shipped products.

In active use, in beta, or on the waitlist.
BatSwing cover drive readiness report

BatSwing

One phone, one side-on capture, one branded report — cricket academies send families a clear picture of the player's batting in English and Hindi. Dozens of swing-analysis reports weekly for one academy.

View BatSwing →
Creative Collab OS song-writing cover art, The Stubborn Star

Creative Collab OS

AI-assisted creativity with better context, taste, and judgment — five distinct crafts (comic writing, song writing, and more), each mapping a few real creative angles before you commit to one. In beta with creatives & content creators.

View Creative Collab OS →
Arbiter trace view of a single architecture decision, end to end

Arbiter

Build-time governance making every agent-assisted architecture call visible before it compounds — a review queue, decision detail, and outcome loop, with a trace-oriented view from coding agent to review queue. Currently in beta.

View Arbiter →

Consulting

The business problem: structural failure modes of manual exception handling
6-agent architecture with exception lifecycle state machine
Bronze to Silver to Gold data pipeline with circuit breaker patterns
7-layer guardrail stack and HITL design
Four-pillar eval framework, pass at k versus pass to the k
LangGraph wiring and implementation detail
Terraform deployment and AWS cost model

Case Study: Agentic AI for Shipment Exception Management

A production-grade reference implementation built on LangGraph, FastAPI, and LiteLLM — from the business problem through a runnable deployment on AWS.

  • The Problem — Four structural failure modes of manual exception handling at scale
  • Agentic Control Tower — 6-agent architecture with exception lifecycle state machine
  • Data Layer — Bronze → Silver → Gold pipeline with circuit breaker patterns
  • Trust & Safety — 7-layer guardrail stack, HITL design, four "never" rules
  • Eval-First Engineering — Four-pillar eval framework, pass@k vs pass^k
  • Implementation & Deployment — LangGraph wiring, Terraform, AWS cost model
View engagement →

AI/ML Interview Prep Platform

11 free chapters · 18 practice scenarios · scored on 4 rubric dimensions.
Interview prep course chapter reader
Practice question panel with production-scale scenario
Rubric-scored result with expert answer
Progress dashboard with domain breakdown and score trend
Interview simulator concept, coming soon

Tech Abstractions Interview Prep

A structured interview preparation platform covering the technical abstractions that matter most in modern AI engineering roles — from distributed systems to LLM internals. Grounded in the same material as the Illustrated Explainer series.

  • Read Course Chapter — 11 free chapters covering production systems, not toy examples
  • Attempt practice questions — 18 scenarios with a 3-rung follow-up ladder from mid-level to staff difficulty
  • Get Scored — Automated scoring across 4 rubric dimensions, expert answers unlock after submission
  • Progress dashboard — Domain breakdown, score trend, and weak-area callouts
  • Interview Simulator (coming soon) — Live mock interview session with an AI interviewer, real-time rubric updates
Open platform →
View interview prep page →

Plugins, SKILLS for Coding Agents

2 plugins.
Agentic AI Engineering

Agentic AI Engineering

A complete skill plugin covering the full agentic AI engineering lifecycle — from system design to evaluation harness to production deployment.

  • Agent architecture — Decision records for every design call
  • Harness engineering — Loop design and tool permission modeling
  • Context engineering — Patterns for grounding agent behavior
  • Evaluation harness — Scaffolding built before agent code
  • Production readiness — Checklist before rollout
View plugin →
Production MLOps

Production MLOps plugin

End-to-end MLOps workflows for teams running ML in production — from experiment tracking to deployment pipelines to monitoring and alerting.

  • Experiment tracking — Versioning across training runs
  • Model registry — Promotion workflows to production
  • Deployment pipelines — Scaffolding for shipping models
  • Feature store — Design patterns for shared features
  • Monitoring — Drift detection and incident playbooks
View plugin →
Browse all dev-tools →

Corporate Training

5 course series · 79 chapters total.
Agentic AI for Leaders sample slide
Leadership · Strategy

Agentic AI for Leaders

A series for senior executives and engineers on what agentic AI actually is, how to make the right architectural choices, and how to move from pilot to production. Covers strategic framing, build-vs-buy decisions, product strategy, pricing, and governance.

  • Scope — 2 modules · 7 chapters
  • Audience — Executive / Senior
Book a training session →
Engineering AI Agents sample slide
Engineering · Technical

Engineering AI Agents

The definitive technical course for engineers and architects building production-grade agentic AI systems. Covers cognitive architecture, memory systems, tool use, agentic design patterns, multi-agent coordination, evaluation frameworks, guardrails, observability, security, and cost/latency optimization.

  • Scope — 6 modules · 22 chapters
  • Audience — Engineer / Architect
Book a training session →
MLOps Production Guide sample slide
MLOps · Production

MLOps Production Guide

A workbook-style course for ML engineers who need to ship, operate, and scale ML systems in production. Covers problem framing through monitoring and incident response — the operational decisions that determine whether a model creates real business value.

  • Scope — 8 modules · 22 chapters
  • Audience — ML Engineer
Book a training session →
Harness Engineering sample slide
Systems · Engineering

Harness Engineering

A systems course on the non-model layer that makes coding agents reliable: loop mechanics, state management, permissions modeling, verification pipelines, and human-in-the-loop workflow design. The engineering discipline behind trustworthy agents.

  • Scope — 4 modules · 14 chapters
  • Audience — Senior Engineer
Book a training session →
LLM Inference Engineering sample slide
Infrastructure · Research

LLM Inference Engineering

A practical series on how large language model inference works in production: tokens, decode loops, KV caches, attention mechanics, continuous batching, memory management, quantization, and the economics of serving at scale.

  • Scope — 4 modules · 14 chapters
  • Audience — Engineer / Researcher
Book a training session →

Fine-tuning SLMs, Dataset creation for Domain specific tasks

AI Sitcom Scriptwriter

Reinforcement fine-tuning (RFT)
AI Sitcom Scriptwriter RFT reward training curve AI Sitcom Scriptwriter score distribution boxplot

Teaching an open-source LLM to write The Office — reasoning-first screenplay generation with on-brand humor, character voice, and multi-step setups. SFT on reasoning traces + screenplay pairs, then reinforcement fine-tuning (RFT) with PPO, judged by an LLM-as-judge across 8 weighted metrics.

View case study →

AI Feynman Kannada Tutor

Fine-tuned tutor model
AI Feynman Kannada Tutor training pipeline AI Feynman Kannada Tutor score distribution boxplot

Multi-stage fine-tuning pipeline creating a reasoning-first physics tutor in Kannada — combining SFT and RAG for intuitive, grounded explanations. Multi-stage SFT (language → domain → grounding) across a 4-model progression, evaluated with LLM-as-judge on a 0–5 scale.

View case study →
Build with Vavi Labs
Building an AI system where correctness, impact, and adoption matter?
Discuss the system →