AI

Artificial intelligence research, products, and policy.

AIHugging Face Blog

ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM

2026-05-27

AIHugging Face Blog

Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL

2026-05-27

AIHugging Face Blog

Reachy Mini goes fully local

2026-05-27

AIarXiv cs.AI

Algorithmic Monocultures in Hiring

Many employers screen job applicants with algorithms built by the same few algorithm vendors. We hypothesize that algorithmic monoculture leads to the same individuals and members of the same racial groups facing rejection. We acquire and analyze a novel dataset of 3 million applicants submitting 4 million applicati...

2026-05-26

AIarXiv cs.AI

MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation

Large language model (LLM) agents rely on reusable skills to solve complex tasks. However, existing skill creation approaches treat skills as isolated and static artifacts, limiting their reusability, reliability, and long-term improvement. We propose MUSE-Autoskill Agent (Memory-Utilizing Skill Evolution), a skill-...

2026-05-26

AIarXiv cs.AI

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

Vision-language models (VLMs) commonly formulate visual grounding and detection as a coordinate-token generation problem, serializing each 2D box into multiple 1D tokens that are learned and decoded largely independently. This token-by-token decoding mismatches the coupled structure of box geometry and creates a pra...

2026-05-26

AIarXiv cs.AI

Natural Language Query to Configuration for Retrieval Agents

Modern retrieval agents expose many configuration choices -- LLM, retriever, number of documents, number of hops, and synthesis strategy -- each shaping both answer quality and serving cost. Today, these pipelines are typically hand-tuned once per workload, leaving substantial per-query optimization untapped. We for...

2026-05-26

AIarXiv cs.AI

GENESIS: Harnessing AI Agents for Autonomous 6G RAN Synthesis, Research, and Testing

Cellular research and development (R&D) is throttled by six structural processes that each consume months of manual engineering work per iteration: (i) synthesizing new features from standards or research papers into production code; (ii) conformance and interoperability testing; (iii) hardening against field anomal...

2026-05-26

AIarXiv cs.AI

MobileMoE: Scaling On-Device Mixture of Experts

Mixture-of-Experts (MoE) has become the de facto architecture for hundred-billion-parameter language models, yet its advantages at sub-billion scales for on-device deployment remain largely unexplored. To close this gap, we present MobileMoE, a family of on-device MoE language models with sub-billion active paramete...

2026-05-26

AIarXiv cs.AI

Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases

Reinforcement Learning from Human Feedback (RLHF) is the standard method to align Large Language Models (LLMs) with human preferences. In this work, we introduce alignment tampering, a potential vulnerability where the LLM undergoing alignment influences the preference dataset, causing RLHF to amplify undesired beha...

2026-05-26

AIarXiv cs.AI

Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders

Model internals encode rich information about how a large language model (LLM) processes its training data; however, post-training data engineering largely relies on external signals and ignores rich intrinsic signals lying in model internals. We propose SAERL, a data engineering framework for LLM reinforcement lear...

2026-05-26

AIarXiv cs.LG

From Scores to Gibbs Correctors: Accelerating Uniform-Rate Discrete Diffusion Models

Discrete diffusion models have achieved strong empirical performance in text and other symbolic domains, but, especially for uniform-rate models, they often require many steps to generate a single sample. Existing acceleration methods either rely on training additional quantities or suffer from slow mixing. In this...

2026-05-26

AIarXiv cs.AI

When Eyes Betray AI: Social Gaze Consistency as a Semantic Cue for AI-Generated Image Detection

Recent generative models have largely closed the gap on low-level artifacts - pixel fingerprints, frequency anomalies, upsampling traces - particularly in person-centric and partial-edit settings where the manipulated region is small and surrounded by photometrically authentic content. We introduce Social Gaze Consi...

2026-05-26

AIarXiv cs.LG

Towards Controllable Image Generation through Representation-Conditioned Diffusion Models

Diffusion models have emerged as powerful tools for high-quality image generation and editing, but guiding these models to produce specific outputs remains a challenge. Conventional approaches rely on conditioning mechanisms, such as text prompts or semantic maps, which require extensively annotated datasets. In thi...

2026-05-26

AIarXiv cs.AI

2-ASP(Q) programs with weak constraints: Complexity and efficient implementation

ASP(Q) extends Answer Set Programming (ASP) with Quantifiers over answer sets. In this paper we focus on the class of ASP(Q) programs with two quantifiers and weak constraints, denoted as 2-ASP(Q)^w. 2-ASP(Q)^w is a practically relevant fragment of ASP(Q) that is expressive enough to capture optimization problems up...

2026-05-26

AIarXiv cs.LG

Probabilistic Smoothing with Ratio-Monotone Transforms for Global Optimization

Probabilistic smoothing is a standard tool for global optimization, but existing methods rely on Gaussian kernels and specific transforms, often resulting in strong hyperparameter sensitivity and limited robustness. We propose a general smoothing framework that combines flexible symmetric unimodal kernels with monot...

2026-05-26

AIarXiv cs.LG

Greening AI Inference with Accuracy and Latency-aware User Incentives

The widespread use of AI services has raised concerns for its environmental sustainability, towards which recent studies have identified carbon emissions of AI inference as the major contributor. This paper introduces a framework for designing AI inference incentives based on the users' valuation for inference quali...

2026-05-26

AIarXiv cs.LG

Normal Guidance is what Attention Needs

We consider training classifiers for 3D medical images using only one binary label for the entire volume rather than a label for each 2D slice. In such weakly supervised settings, can we learn accurate classifiers for slice-level predictions? Attention-based multiple instance learning (MIL) can produce an attention...

2026-05-26

AIHugging Face Blog

AIGoogle AI

100 things we announced at I/O 2026

Image with the words "Ready, Set, I/O" and a colorful Gemini logo

2026-05-20

AI

ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM

Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL

Reachy Mini goes fully local

Algorithmic Monocultures in Hiring

MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

Natural Language Query to Configuration for Retrieval Agents

GENESIS: Harnessing AI Agents for Autonomous 6G RAN Synthesis, Research, and Testing

MobileMoE: Scaling On-Device Mixture of Experts

Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases

Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders

From Scores to Gibbs Correctors: Accelerating Uniform-Rate Discrete Diffusion Models

When Eyes Betray AI: Social Gaze Consistency as a Semantic Cue for AI-Generated Image Detection

Towards Controllable Image Generation through Representation-Conditioned Diffusion Models

2-ASP(Q) programs with weak constraints: Complexity and efficient implementation

Probabilistic Smoothing with Ratio-Monotone Transforms for Global Optimization

Greening AI Inference with Accuracy and Latency-aware User Incentives

Normal Guidance is what Attention Needs

Harness, Scaffold, and the AI Agent Terms Worth Getting Right

Towards Speed-of-Light Text Generation with Nemotron-Labs Diffusion Language Models

Catch up on the Dialogues stage at Google I/O 2026.

Specialization Beats Scale: A Strategic Variable Most AI Procurement Decisions Overlook

We’re launching the Google DeepMind Accelerator program in Asia Pacific to tackle environmental risks

We’re announcing new community investments in Missouri.

100 things we announced at I/O 2026