ScreenSuite - The most comprehensive evaluation suite for GUI Agents!

Hugging Face Blog · 2025-06-06

Open source

Related items

AIHugging Face Blog2024-05-24

CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models

AIHugging Face Blog2025-09-23

Smol2Operator: Post-Training GUI Agents for Computer Use

AIHugging Face Blog2025-07-17

Back to The Future: Evaluating AI Agents on Predicting Future Events

AIHugging Face Blog2026-03-24

A New Framework for Evaluating Voice Agents (EVA)

AIHugging Face Blog2022-06-28

Announcing Evaluation on the Hub

AIarXiv cs.AI2026-05-26

Natural Language Query to Configuration for Retrieval Agents

Modern retrieval agents expose many configuration choices -- LLM, retriever, number of documents, number of hops, and synthesis strategy -- each shaping both answer quality and serving cost. Today, these pipelines are typically hand-tuned once per workload, leaving substantial per-query optimization untapped. We for...