
As AI agents move from simple experiments into real business processes, we are learning how to make them reliable at scale. This talk shares our journey of building the capabilities needed to support agents in production, highlighting both the successes and the unexpected challenges along the way. We’ll discuss why current agent tooling often requires significant adaptation for real-world use cases, where the biggest design and operational challenges emerge, and how to improve latency, quality, consistency, and resource utilization. Most importantly, we’ll show why testing and evaluation are not optional extras, but core ingredients in building trustworthy agents that deliver meaningful business outcomes.



