“We are flying blind,” said the data platform leader at a large re-insurance provider. “We log traces, but lack a reliable way to analyze them—so they just sit there. I’m under pressure to deliver returns from this massive AI investment, and it starts with a better understanding of which AI use cases are most valuable. This insight is hidden in these logs somewhere.”
A year ago, her team couldn’t get governance approvals to use LLMs. Once they overcame this hurdle, she released a chat experience internally. Then she added RAG to this solution, expanding the types of questions her teams could ask and the relevance of answers they would expect back. More recently, she added an agentic router. Now she has thousands of users with tens of thousands of requests per day. But her AI product is a black box and she is missing the tools to navigate it.
She is not alone. I’ve met with dozens of companies that are asking similar questions, and finding it harder than expected to get answers. Since this conversation, we’ve been working with the data platform leader and folks like her at other companies to solve this problem.
If you have any of these problems, or if you struggle with your own unique variant, you may be inclined to pick up a tried and true DevOps solution, like traditional monitoring or product analytics tools. But AI applications come with their own unique attributes that break these products.
Unstructured or semi-structured data makes it hard to statistically analyze usage of these applications. When you are given piles of data, it is non-trivial to unpack what matters and what you can ignore. The high degree of non-determinism implicit in these models that makes them so powerful is also what makes them hard to evaluate. This non-determinism means that a lot of issues, insights, or information can be hidden in a summary metric. And both of these problems are compounded by the complexity of the typical AI application. Context sources, retrieval mechanisms, re-rankers, tools, tasks, and routers are all examples of components outside the AI model that are constantly changing. The need to assess and interpret traces rather than specific data points makes this problem even harder.
Unstructured data, non-determinism, and multi-component complexity are all problems that have been dealt with in some form or another, but the way that these problems collide with AI applications requires they be treated with a purpose-built AI product analytics solution.
The key to peering into the production AI black box is understanding AI behavior—the interplay and correlations between users, context, tools, models, and metrics.
Distributional’s product is designed to help you understand your product as it evolves. Our platform automatically learns the graph of user experiences with your product, classifies these experiences, and then serves daily insights on them. As you take action on these insights, Distributional tailors future insights to these preferences. It gives you the initial toehold, and the more you use it the higher you climb the mountain.
Distributional enables this experience with three components: pipeline, workflow, and platform.
Learn more: https://docs.dbnl.com
Our goal is to make it as easy as possible to get this pipeline and workflow running in your environment. The platform is architected to fit your existing stack. You can install our full service for free. It runs on your Kubernetes cluster, and no data ever leaves your system. We have built an initial set of data and model connectors, and are happy to extend these primitives to make the connection seamless. The goal is for our product to fit your stack, not for you to redesign your stack to fit our product.
Learn more: https://docs.dbnl.com/platform/deployment
Once installed in your environment, Distributional automatically processes your production AI logs to make sense of behavior. This happens in a three step process. First, our product enriches your logs with metrics, statistics, attributes, evals, and LLM as judge metrics to capture the status of AI product behavior and help you understand canonical usage patterns. Second, Distributional analyzes these enriched production logs to uncover behavioral signals with unsupervised clustering, topic modeling, anomaly detection, and data drift assessment. This analysis generates a complete graph of typical usage, organizes usage through classification by highest propensity topics, and produces a set of insights on compelling recent usage patterns or issues. Finally, our product publishes these behavioral signals to a dashboard, your daily report, and session analysis with notifications on significant deviations from historical behavior. As you engage with these published results, our pipeline reinforces these preferences in future analysis.
Learn more: https://docs.dbnl.com/configuration/data-pipeline
This pipeline feeds a workflow designed to help you understand your AI products and complete your AI product feedback loop. First, Distributional produces fresh daily insights that update baseline understanding of usage patterns and alert you to any significant changes, shifts, or deviations. Second, our product links these daily insights to evidence—specific logs, sessions, and charts—so you can perform rapid root cause analysis. As you either make changes to your product, adjust thresholds for existing metrics, or decide to add new metrics, Distributional becomes your source of record for these changes so you can track how the behavior of your AI product evolves over time.
Learn more: https://docs.dbnl.com/workflow/adaptive-analytics-workflow
This approach takes some of the challenges of AI applications and turns them into an advantage—insights that your team can use to improve these applications over time.
It also implies that you should spend as little time upfront on evals as possible, and instead get your AI application in production as soon as you feel comfortable. Then perform robust analysis of production data to guide how you are developing your AI application. Unsupervised techniques are well designed for giving you these insights “for free” on production data.
Ultimately, this approach brings a few benefits. You’ll ship AI applications faster, avoid degradations as you scale, evolve your applications with your users over time, and be more confident in AI performance by aligning its behavior to your goals.
Ready to go? Install our product free and get started today. And I’m always happy to chat about it with you, so find me on LinkedIn or email nick-dbnl@distributional.com.