AI
ScreenSuite - The most comprehensive evaluation suite for GUI Agents!
Hugging Face Blog · 2025-06-06
Related items
AIHugging Face Blog
CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models
AIHugging Face Blog
Smol2Operator: Post-Training GUI Agents for Computer Use
AIHugging Face Blog
Back to The Future: Evaluating AI Agents on Predicting Future Events
AIHugging Face Blog
A New Framework for Evaluating Voice Agents (EVA)
AIHugging Face Blog
Announcing Evaluation on the Hub
AIarXiv cs.AI
Natural Language Query to Configuration for Retrieval Agents
Modern retrieval agents expose many configuration choices -- LLM, retriever, number of documents, number of hops, and synthesis strategy -- each shaping both answer quality and serving cost. Today, these pipelines are typically hand-tuned once per workload, leaving substantial per-query optimization untapped. We for...