DBNL on TWIML

Written by

Nick Payton

Recently, Sam Charrington interviewed Scott Clark on his This Week in Machine Learning and AI podcast (TWIML AI). Their discussion focused on analytics in the AI observability stack, including where it fits, how it works, how to do it, and concrete examples of the ways these analytics can boost production agents. This discussion also included weather analogies and a throughline from machine learning objective function to LLM eval solicitation – both similarities and differences. It was a fun conversation and we recommend you give it a watch or listen!

‍

Here are a few of my favorite segments from the interview:

3:18: Agent analytics as process of converting insights from production agent traces into real improvements or fixes to your production agent – aligning its behavior over time

6:33: How evals, objective functions, and metric solicitation is one of the hardest things to get right in AI and ML, and how this goes beyond fitting to a benchmark

7:15: Throughline from SigOpt, where we helped AI/ML teams optimize models, to Distributional, where we help AI/ML teams understand agents

7:45: Real value in analytics for agents is in finding antipatterns that you can discover from production traces

8:15: Specific examples of antipatterns that you can find with analytics but are hard to find with just evals and monitoring – with a focus on deep tool-related hallucinations

10:45: Hierarchy of observability and walk through on the importance of logging and monitoring as independently useful functionality from analytics

13:15: How Distributional automates the analysis of traces, including use of comparisons too bootstrap pattern discovery without assumptions around whether the patterns are good or bad

17:05: How agent analytics trade off timeliness for richness and therefore directly complements monitoring (rather than replacing it)

18:45: Agent analytics are built for teams that have agents in production, are scaling, and now care about improving these production agents over time

20:45: How the data flywheel needs to be driven by analytics to cut through the noise and find signals to drive production agent improvements

22:30: Process of going from traces to insights, including discussion of the problems you encounter along this path and how you solve these problems with an analytics pipeline

26:15: Discussion of the Clio paper and methods to do topic modeling as a pathway to understand user intent

27:10: Connection between evals and analytics, and use of analytics as a source for better evals that reflect desired online agent behavior (as this shifts and evolves over time)

32:35: Roadmap for how to approach AI observability, including use of OpenTelemetry, GenAI or OpenInference semantic conventions, use this to get a basic understanding trace by trace, and as you scale up usage introduce analytics for broader analysis of many traces

35:00: Explanation of how analytics drives process of signal discovery for driving the Karpathy-like auto flywheel for an LLM

45:55: How analytics learns unknown unknowns so they can be encoded into the monitoring system – and how automation of this analysis is critical to scaling it

46:15: Using Distributional for this for free with installable and SaaS options

47:30: Signals that the analytics process is helping your workflow, including speed to discovery and utility of these insights in an agent improvement process

51:00: How Distributional’s free service fits into the broader market, and how it fits into the broader AI observability stack

Subscribe to DBNL

Thank you for your submission!

Oops! Something went wrong while submitting the form.

DBNL on TWIML

Written by

Nick Payton

Examples of issues that break agents in production

Written by

Nick Payton

Flying blind on agent behavior

Written by

Nick Payton