Sumit Gupta

A series of experiments testing whether specialized AI agents on local models can match cloud API quality for personal task management.

Building a Production Eval System for AI Agents Apr 7, 2026
What we learned building a quality measurement system for a multi-agent AI, drawing on practitioner wisdom from Hamel Husain, Eugene Yan, Braintrust, and applied-llms.org.