Explore AMD Solutions for ISVs
Learn how customers across industries benefit from the powerful collaboration between AMD and ISV partners.
Iterate.ai Case Study
Iterate.ai taps AMD Ryzen™ AI PRO processors for 32B private LLMs with 32k context window models at ~60-80 tokens/sec, cutting cloud costs and risks.
Iterate.ai taps AMD Ryzen™ AI PRO processors for 32B private LLMs with 32k context window models at ~60-80 tokens/sec, cutting cloud costs and risks
Iterate.ai calls itself “the private AI company.” As enterprises adopt generative AI to boost productivity, Iterate.ai focuses on sophisticated workflows that run securely within organizational boundaries. Its Generate application enables large organizations to build personalized AI assistants and agentic workflows that efficiently analyze complex data scattered across critical business systems such as Jira, Slack, and Salesforce.
To deliver powerful local AI functionality while helping prevent data exposure, Iterate.ai collaborated closely with AMD to optimize Generate exclusively for AMD Ryzen™ AI PRO processors. This collaboration helps Generate use the AMD Ryzen AI PRO processor’s CPU, integrated GPU, and dedicated NPU in concert. By anchoring the solution to the user’s device, Iterate.ai provides 100% data retention and control, while converting variable cloud token expenses into a predictable, fixed cost for enterprises.
IT decision makers value both performance and security. The shift to AI intensified security concerns as organizations struggle to manage the risk of sensitive, proprietary information being exposed to public large language models (LLMs).
“The big pain point that Generate addresses is safety,” said Iterate.ai Special Project Engineer Karanbir Singh. “When companies use a public LLM, they don’t have control over the data that is retained. Iterate.ai runs the AI models right next to the application on your local device so that no data is pushed to the Internet.”
Compounding the challenge is cost. Iterate.ai Co-founder and CTO Brian Sathianathan says, “Recent industry studies indicate that there has been as much as a 100X increase in token usage in just the past year.” Cloud-based LLM inference is charged per token, creating high variability and cost uncertainty for enterprise budgets. Iterate.ai wanted to empower enterprises to move their AI workflows away from this usage-based model and establish full data sovereignty. The solution also needs to work offline so users can continue using AI even when traveling or experiencing connectivity issues.
Iterate.ai chose AMD for its comprehensive technology stack, which provides the scale and flexibility needed for its enterprise product. “What’s compelling about AMD is that they have a laptop offering with potent processors that include CPU, GPU, and NPU capabilities. At the same time, AMD also offers powerful server-side solutions like the AMD Instinct™ MI300 Series GPUs,” says Sathianathan. “With AMD, there’s an end-to-end solution from the client side all the way to the server. AMD is at the right spot to enable the private AI revolution, making them an ideal partner for Iterate.ai.”
Iterate.ai integrated AMD Lemonade Server, an open-source tool that simplifies development for local LLMs. “Lemonade Server exposes everything as OpenAI compatible, the default industry standard. The work AMD put into Lemonade Server makes integration between our application and the LLM much easier,” said Sathianathan.
The streamlined approach accelerated Iterate.ai’s deployment timeline and reduced resource requirements. Sathianathan says, “It would have taken us at least twice the number of engineers had we had to do all the work ourselves. The combination of AMD software and hardware is very stable, stellar in fact, and allowed us to manage development with far fewer resources than typically required on the LLM side.”
The high-performance architecture of the AMD Ryzen AI PRO processor allowed Iterate.ai to design a local AI application with greater capacity than previously possible. Generate uses all three components of the AMD Ryzen AI PRO processor to achieve optimal performance: the CPU, GPU, and NPU. This parallel architecture is essential because Generate handles use cases from simple document summarization to highly complex analysis of charts and graphs.
The coordinated hardware acceleration is vital for power efficiency and battery life. Sathianathan highlighted the benefits of leveraging the specialized NPU, saying, “The AMD Ryzen AI NPU is super-optimized to run the LLM for long periods, which is perfect for very complex tasks. On a PC you live in a memory- and resource-constrained environment. We take advantage of the different power profiles of NPU, GPU, and CPU. Being able to use all of the AMD Ryzen AI PRO processor’s silicon resources in parallel is ideal. That approach also helps avoid draining the user’s laptop battery during heavy LLM processing.”
Sathianathan notes that deploying Iterate.ai on the AMD platform is essential to their ability to meet the demands of the modern enterprise. “AMD helps make it possible to leverage local AI with security, privacy, and cost benefits, creating certainty for businesses and organizations,” he says.
At the same time, such certainty is built on measurable, superior performance. Iterate.ai testing confirmed the AMD platform enabled exceptional model scaling. “In industry standards, 14 billion parameter models with a 16K context window are good benchmark models for an AI PC,” said Singh. “With AMD Ryzen AI processors, we can leverage models that are 32 billion parameter models, almost double the size, with a 32K context window.”
This scale advantage allows Iterate.ai to handle more demanding enterprise workloads locally on the PC. Testing metrics validate the high throughput of the proprietary Iterate.ai model running on the system: its Interplay RAG model achieved an average throughput of ~60-80 tokens/sec, with a peak of 92.89 tokens/sec during a Document Search query.
“Running Generate locally means it’s 100% fixed cost,” says Sathianathan. “You move a lot of variability costs because it’s a CapEx, one-time purchase, that you basically amortize over the life of the PC.”
By keeping the entire LLM’s memory on the PC, Generate supports data sovereignty and helps organizations align with compliance standards such as HIPAA, GDPR, and SOC 2. This privacy-first approach is reinforced by AMD PRO Security, which provides multi-layered defenses such as AMD Memory Guard. AMD Memory Guard encrypts system memory in real time to help safeguard sensitive business data in the event of a lost or stolen PC.
Iterate.ai is already looking ahead to optimizing its application for the next generation of AMD processors. Singh highlighted the AMD roadmap and partnership. “In the next 12 to 18 months, the aim is to leverage even bigger LLMs with bigger context windows to satisfy more and more customer use cases,” he said. “I think AMD is doing great in their vision for the future, and we’re excited to be their partner so we can jump on new opportunities and build those use cases.”
Iterate.ai is an innovation-focused software company that builds enterprise-ready AI platforms and applications for large organizations. Headquartered in San Jose, California, Iterate.ai offers Interplay, a patented low-code AI platform, and private AI solutions such as Generate that help enterprises rapidly build, deploy, and scale secure generative AI experiences across industries including retail, financial services, and technology. For more information visit iterate.ai.
© 2026 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, Instinct, Ryzen, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names contained herein are for identification purposes only and may be trademarks of their respective owners. Certain AMD technologies may require third-party enablement or activation. Supported features may vary by operating system. Please confirm with the system manufacturer for specific features. No technology or product can be completely secure.
All performance and cost savings claims are provided by Iterate.ai and have not been independently verified by AMD. Performance and cost benefits are impacted by a variety of variables. Results herein are specific to Iterate.ai and may not be typical. GD-181
The information contained herein is for informational purposes only and is subject to change without notice. While every precaution has been taken in the preparation of this document, it may contain technical inaccuracies, omissions and typographical errors, and AMD is under no obligation to update or otherwise correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of this document, and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or fitness for particular purposes, with respect to the operation or use of AMD hardware, software or other products described herein. No license, including implied or arising by estoppel, to any intellectual property rights is granted by this document. Terms and limitations applicable to the purchase or use of AMD products are as set forth in a signed agreement between the parties or in AMD's Standard Terms and Conditions of Sale. GD-18u.