CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models

Hugging Face Blog · 2024-05-24

Chinese Original

Related items

AIHugging Face Blog2025-07-04

Announcing NeurIPS 2025 E2LM Competition: Early Training Evaluation of Language Models

AIHugging Face Blog2024-02-02

NPHardEval Leaderboard: Unveiling the Reasoning Abilities of Large Language Models through Complexity Classes and Dynamic Updates

AIHugging Face Blog2025-06-06

ScreenSuite - The most comprehensive evaluation suite for GUI Agents!

AIHugging Face Blog2022-10-03

Very Large Language Models and How to Evaluate Them

AIGoogle DeepMind2025-12-09

FACTS Benchmark Suite: Systematically evaluating the factuality of large language models

Systematically evaluating the factuality of large language models with the FACTS Benchmark Suite.

AIHugging Face Blog2025-02-04

π0 and π0-FAST: Vision-Language-Action Models for General Robot Control

Feedback

TypeMessage