Introducing HELMET: Holistically Evaluating Long-context Language Models

Hugging Face Blog · 2025-04-16

Chinese Original

Related items

AIHugging Face Blog2022-10-24

Evaluating Language Model Bias with 🤗 Evaluate

AIGoogle DeepMind2025-12-09

FACTS Benchmark Suite: Systematically evaluating the factuality of large language models

Systematically evaluating the factuality of large language models with the FACTS Benchmark Suite.

AIHugging Face Blog2024-04-16

Introducing the LiveCodeBench Leaderboard - Holistic and Contamination-Free Evaluation of Code LLMs

AIHugging Face Blog2022-10-03

Very Large Language Models and How to Evaluate Them

AIHugging Face Blog2025-07-04

Announcing NeurIPS 2025 E2LM Competition: Early Training Evaluation of Language Models

AIHugging Face Blog2021-07-15

Deep Learning over the Internet: Training Language Models Collaboratively

Feedback

TypeMessage