For teams scaling AI applications in production, adaptive testing is critical for consistency and reliability. But testing these applications is not trivial. This is especially true for any LLM-based application, due to the nature of the embedding process, and subsequent recovery of text from embedding space, these apps require statistical analysis and testing.
Additionally, once an app is running in production, testing automation becomes necessary since teams can no longer manually test for all possible usages and analyze all possible responses. Even when ignoring the infeasibility of searching the space of all possible text, these teams are paid to ship things, not solely test them.
Distributional provides an adaptive testing platform that is uniquely designed to address these needs. Distributional’s testing strategy is based on analyzing recent production usage to test for consistency of app behavior as a whole, while also providing mechanisms to both alert users to behavioral deviations and provide interpretable evidence for users to understand what has occurred.
In this paper, we describe Distributional’s testing strategy in more detail, and how it can be applied for both live and synthetic production testing.