Building AI Systems You Can Trust
May 23, 2025
As AI systems grow more complex, the methodologies used to evaluate them must evolve accordingly. Erin LeDell, Chief Scientist at Distributional, discusses the transition from deterministic ML evaluation to the complexities of generative AI assessment—examining why accuracy-based metrics fall short, how new frameworks address coherence, consistency, and bias, and the challenges of reproducibility in probabilistic AI systems. She also discusses how to test AI models in real-world situations, identify where they might go wrong, and make sure they’re safe, reliable, and work as expected.