Model Evaluation: From ML to GenAI

As AI systems grow more complex, the methodologies used to evaluate them must evolve accordingly. Erin LeDell, Chief Scientist at Distributional, discusses the transition from deterministic ML evaluation to the complexities of generative AI assessment—examining why accuracy-based metrics fall short, how new frameworks address coherence, consistency, and bias, and the challenges of reproducibility in probabilistic AI systems. She also discusses how to test AI models in real-world situations, identify where they might go wrong, and make sure they’re safe, reliable, and work as expected.

Subscribe to DBNL

By subscribing you are agreeing to our Privacy Policy

Thank you for your submission!

Oops! Something went wrong while submitting the form.