AI
Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval
Hugging Face Blog · 2024-03-22
Related items
AIHugging Face Blog
Introducing RTEB: A New Standard for Retrieval Evaluation
AIHugging Face Blog
Introducing AutoRound: Intel’s Advanced Quantization for LLMs and VLMs
AIarXiv cs.AI
Natural Language Query to Configuration for Retrieval Agents
Modern retrieval agents expose many configuration choices -- LLM, retriever, number of documents, number of hops, and synthesis strategy -- each shaping both answer quality and serving cost. Today, these pipelines are typically hand-tuned once per workload, leaving substantial per-query optimization untapped. We for...
AIHugging Face Blog
Quanto: a PyTorch quantization backend for Optimum
AIHugging Face Blog
Granite Embedding Multilingual R2: Open Apache 2.0 Multilingual Embeddings with 32K Context — Best Sub-100M Retrieval Quality
AIHugging Face Blog