AIGoogle DeepMind2025-12-09FACTS Benchmark Suite: Systematically evaluating the factuality of large language modelsSystematically evaluating the factuality of large language models with the FACTS Benchmark Suite.
AIHugging Face Blog2024-04-16Introducing the LiveCodeBench Leaderboard - Holistic and Contamination-Free Evaluation of Code LLMs
AIHugging Face Blog2025-07-04Announcing NeurIPS 2025 E2LM Competition: Early Training Evaluation of Language Models
AIHugging Face Blog2021-07-15Deep Learning over the Internet: Training Language Models Collaboratively