
At re:Invent in December 2025, NVIDIA selected us as an AI partner to join them in their booth. It was a privilege, and gave us the opportunity to showcase how AI product teams can use Distributional with NVIDIA software as a fully open and free production stack for AI agents. Ultimately, we built a demo to show you how to use this stack with your own agents.
Here is Ian on our team summarizing this demo:
If you’d prefer to explore this demo more concretely yourself, then you can access the same data produced by this agent and analyzed by Distributional for this demo by using our read-only SaaS experience with an open and free username and password. Find these credentials here: https://docs.dbnl.com/get-started/quickstart
And here is a quick run-down of my takeaways if you want to skip to the most relevant sections:
0:30 – Ian explains how understanding agent behavior can be complex, and, without this understanding, it can be hard to support, improve, fix, and scale agents in production. Without regularly analyzing production AI traces, examining interesting traces, clusters, correlations, and usage patterns, and making decisions informed by these insights, it can be challenging to know what to fix, or even when something is broken in your agent. And it can be equally hard to understand where there may be an opportunity to improve your agent. This open stack is designed to help AI product teams complete the feedback cycle to continuously improve their agents in production.
1:30 – This example relies on an open, free stack that includes NVIDIA and Distributional components. Distributional provides the analytics layer, and is designed to integrate seamlessly regardless of agent framework, model, or optimizer your team uses. In this example, we rely on the NVIDIA NeMo Agent Toolkit v1.3.0 as our extensible framework, NVIDIA NIM k8s Operator to efficiently run our LLMs for both the agent and analytics tasks, and NVIDIA NeMo Agent Toolkit Optimizer for offline hyperparameter optimization to improve the agent guided by insights from Distributional. We used AWS S3 for storing traces from the agent, and AWS EC2 for compute. As it runs, NAT publishes 7k traces per day, which DBNL analyzes. DBNL enriches these traces with LLM-as-judge and standard metrics, then analyzes these metrics to uncover behavioral signals hidden in these logs. DBNL uses gpt-oss-20B for LLM-as-judge, and efficiently scales this judge by using a NIM on a p5 instance. As DBNL discovers new signals, we use functionality in NAT like their HPO feature to make improvements or fixes to the agent guided by insights from these signals.
2:10 – Distributional helps you understand agent behavior and interesting usage patterns over time with metrics, statistics, topics, tool call patterns, and downstream KPIs that begin to paint a picture around how the agent is being used and where it is performing well or poorly. This includes standard information on cost, quality, and speed like log count, tool success rate, average user feedback and average duration, as well as a collection of time series on each of these metrics. But it also includes topics that Distributional has learned from the underlying inputs and then used to classify new queries, enabling us to display trends on these topics over time. And tool calls over time, as well as error rates. This enables you to begin to understand usage patterns in the context of agent behavior. Distributional also automatically creates a visualization of tool call sequences. Correlating attributes helps build intuition on how the agent behaves when results look good (good user feedback or high LLM as judge scores) or poorly based on user query.
3:30 – Distributional analyzes all production AI logs daily. This new analysis keeps the data fresh in the dashboard we just walked through above. But it also powers a collection of daily insights that Distributional automatically provides to guide fixes, improvements, or changes that should be prioritized by the AI product team. In this case, we notice an issue with data accuracy that affects quality and is high severity. In short, responses from the agent are missing links that make it easy for a user to book a reservation.
4:15 – On creating a custom classifier metric to confirm the issue and track any change. You can navigate quickly from issue to creating a custom metric based on that issue using ready-made templates in Distributional. This reduces time to create new metrics, but also gives you a standard, central way to track these metrics in the same dashboard over time. This classifier is used to identify relevant samples for the issue, and download this dataset of traces for use in offline evaluation and optimization processes.
6:20 – We take the relevant data from Distributional’s classified and labeled set of data, and then we use the NVIDIA Agent Toolkit Optimizer to select a new prompt using this updated eval dataset. This is how online analytics completes the feedback cycle to offline evaluation and optimization workflows.
6:50 – We see the feedback score go up in Distributional. We also see the link correctness metric go up over time. And we also see average daily feedback jump across all topics, and the user frustration metric go down over time.
The easiest way to get started is to use a free SaaS demo account to review this example and other examples that we’ve pre-loaded in Distributional. Next, you can install our sandbox locally on your laptop in ten minutes and run through a tutorial that shows you how to use Distributional for a toy example. Once more familiar with our functionality, you can install the full service for free using a Terraform Module or Helm Chart. We are happy to help through any step of this process, so reach out at support@distributional.com with any questions.

