HELMET: Raising the Bar for Long-Context Language Model Evaluation The rapid advancement of long-context language models (LCLMs) is transforming what AI can do, from digesting entire books to managing vast swaths of information in a single pass. Despite this progress... AI benchmarks evaluation long-context models model-based evaluation open-source models retrieval-augmented generation summarization