In this post, we walk through the new installation experience, demonstrate three deployment methods (console, CLI, and Terraform), and show how features like multi-instance-type deployment and native node affinity give you fine-grained control over inference scheduling
Unlock efficient model deployment: Simplified Inference Operator setup on Amazon SageMaker HyperPod
AWS Architecture Blog · 2026-04-06
Related items
Capacity Efficiency at Meta: How Unified AI Agents Optimize Performance at Hyperscale
We’re sharing insights into Meta’s Capacity Efficiency Program, where we’ve built an AI agent platform that helps automate finding and fixing performance issues throughout our infrastructure. By leveraging encoded domain expertise across a unified, standardized tool interface these agents help save power and free up...
How Synthesia optimizes generative AI video inference on Amazon EC2 G7e instances
This post introduces a video decoding optimization technique that we have ideated in collaboration with Synthesia Research Engineering team, which we call Asynchronous Frame Generation Pipeline. Adopting this technique allows you to overlap GPU compute, device-to-host (D2H) data transfer, and host-side post-processi...
D1, Workers, Workers for Platforms - Automate Worker deployments with a simplified SDK and more reliable Terraform provider
Simplified Worker Deployments with our SDKs We've simplified the programmatic deployment of Workers via our Cloudflare SDKs . This update abstracts away the low-level complexities of the multipart/form-data upload process, allowing you to focus on your code while we handle the deployment mechanics. This new interfac...
Workers AI - Workers AI for Developer Week - faster inference, new models, async batch API, expanded LoRA support
Happy Developer Week 2025! Workers AI is excited to announce a couple of new features and improvements available today. Check out our blog for all the announcement details. Faster inference + New models We’re rolling out some in-place improvements to our models that can help speed up inference by 2-4x! Users of the...
Workers - Deploy a Workers application in seconds with one-click
You can now add a Deploy to Cloudflare button to the README of your Git repository containing a Workers application — making it simple for other developers to quickly set up and deploy your project! The Deploy to Cloudflare button: Creates a new Git repository on your GitHub/ GitLab account : Cloudflare will automat...
Workers AI - Planned model deprecations on Workers AI
We are refreshing the Workers AI model catalog to make room for newer releases. Please update your apps to remove references to the models listed below before the deprecation date. Recommended replacements @cf/zai-org/glm-4.7-flash — fast multilingual model with multi-turn tool calling and coding capabilities. @cf/g...