AIHugging Face Blog2025-07-04Announcing NeurIPS 2025 E2LM Competition: Early Training Evaluation of Language Models
AIHugging Face Blog2024-02-02NPHardEval Leaderboard: Unveiling the Reasoning Abilities of Large Language Models through Complexity Classes and Dynamic Updates
AIGoogle DeepMind2025-12-09FACTS Benchmark Suite: Systematically evaluating the factuality of large language modelsSystematically evaluating the factuality of large language models with the FACTS Benchmark Suite.