Human Evaluations No Longer the Gold Standard for NLG, Says Washington U & Allen AI Study | Synced

By Sonic Mustang · March 16, 2026 · 1 min read

ai
machine learning & data science
research
ai
artificial intelligence

Source: Synced | AI Technology & Industry Review

University of Washington and the Allen Institute for Artificial Intelligence researchers say human evaluations are no longer the gold standard for evaluating natural language generation models, as evaluators’ focus on surface-level text qualities degrades their ability to accurately assess current NLG models’ overall capabilities.