Model Evaluation: From ML to GenAI

As AI systems grow more complex, the methodologies used to evaluate them must evolve accordingly. Erin LeDell, Chief Scientist at Distributional, discusses the transition from deterministic ML evaluation to the complexities of generative AI assessment—examining why accuracy-based metrics fall short, how new frameworks address coherence, consistency, and bias, and the challenges of reproducibility in probabilistic AI systems. She also discusses how to test AI models in real-world situations, identify where they might go wrong, and make sure they’re safe, reliable, and work as expected.

Subscribe to DBNL

Thank you for your submission!

Oops! Something went wrong while submitting the form.

Scott Clark on Taking Stock with Trinity Chavez

June 16, 2025

Building AI Systems You Can Trust

May 23, 2025

Distributional Platform: Architectural Overview

May 20, 2025