AMD and Oracle Cloud Infrastructure Are Powering the Next Wave of AI Innovation

Oct 14, 2025

AMD Instinct™ MI350 series GPUs

Introduction

The AI era is accelerating, and organizations everywhere are seeking the infrastructure to train, fine-tune, and deploy increasingly complex models at scale. At AMD, our mission is to deliver the high-performance, energy-efficient, and open solutions that make this possible.

Oracle Cloud Infrastructure (OCI) has long been a leader in delivering powerful, flexible cloud platforms, and our collaboration has reached an exciting new milestone. As announced at our Advancing AI 2025 event, OCI will be the first hyperscaler to deploy the new AMD Instinct™ MI355X GPUs via their OCI Supercluster, within the AMD open, rack-scale AI infrastructure—ushering in a new era of scale, performance, and efficiency for AI in the cloud.

AMD Instinct MI355X: Scaling AI to Zettascale with Oracle as a First-Mover

At Advancing AI 2025, we unveiled our open, rack-scale AI infrastructure— a fully integrated, standards-based architecture combining AMD Instinct GPUs, AMD EPYC™ CPUs, and AMD Pensando™ networking to deliver balanced, scalable performance for the most demanding AI workloads. Oracle Cloud Infrastructure is the first hyperscaler to adopt this architecture at scale, making them a true leader in deploying the next generation of AI supercomputing.

This is more than just a GPU upgrade—it’s a complete rethinking of how large-scale AI clusters are built and operated. By standardizing on AMD open rack-scale design, OCI can deliver:

  • Massive scale-out capability: The upcoming zettascale AI Supercluster can scale up to 131,072 AMD Instinct MI355X GPUs, interconnected with ultra-low-latency RDMA networking for seamless scaling across tens of thousands of accelerators.
  • Unmatched performance: AMD Instinct MI355X-powered OCI shapes deliver up to 2.8× higher throughput than the previous generation for AI deployments.
  • Memory for the largest models: With 288 GB of HBM3 memory and 8 TB/s bandwidth per GPU, customers can keep massive models entirely in memory, eliminating costly off-chip transfers and accelerating both training and inference.
  • Support for FP4 precision: The new 4-bit floating point standard enables ultra-efficient inference for large language models, reducing compute and memory requirements without sacrificing accuracy.
  • Density and efficiency: A liquid-cooled, 125 kW-per-rack design with 64 GPUs per rack enables maximum performance density, helping reduce data center footprint and power usage.

For Oracle customers, this means the ability to run next-generation AI workloads, from trillion-parameter language models to multi-modal agentic systems, on infrastructure that is fast to deploy, easy to scale, and cost-efficient to operate. And because the AMD ROCm™ open software stack is at the core, customers can bring their existing AI code and frameworks to OCI without costly rewrites.

But GPUs are only one part of the equation. To fully unlock the potential of AMD Instinct MI355X at this scale, the surrounding infrastructure must be equally advanced.

High-Performance Networking with AMD Pensando™ Pollara NICs

Feeding and synchronizing tens of thousands of GPUs requires a network fabric that is as fast and intelligent as the compute itself. That’s why OCI is pairing AMD Instinct MI355X clusters with AMD Pensando Pollara AI NICs, making them the first to deploy this technology in a hyperscale AI backend network.

Pensando Pollara NICs bring advanced RoCE functionality, programmable congestion control, and support for open industry standards from the Ultra Ethernet Consortium. This enables OCI to design innovative, ultra-low-latency network topologies that can keep GPUs fully utilized, even under the heaviest AI training and inference loads.

In large-scale AI systems, networking isn’t just a connector, it’s a performance multiplier. By integrating the Pensando Pollara into OCI’s AMD Instinct MI355X superclusters, data can move at the speed of compute, helping ensure that the full capability of each GPU is realized without bottlenecks.

The Compute Backbone of Rack-Scale AI

At Advancing AI 2025, we emphasized that AI performance is not just about accelerators — it’s about the entire system working in harmony. In our open rack-scale AI architecture, AMD EPYC processors are the central orchestrators, managing data pipelines, coordinating GPU workloads, and driving the non-accelerated portions of AI training and inference.

In addition to OCI’s AMD Instinct MI355X deployment, the company is also announcing the E6 instances on 5th Gen AMD EPYC Processors, which enable:

  • High core density and memory bandwidth to handle massive data preprocessing and model orchestration in parallel with GPU compute.
  • Fast I/O and PCIe® Gen5 connectivity to keep GPUs and Pensando Pollara NICs fed with data at full speed.
  • Scalability from cloud to edge — from OCI’s public cloud E6 shapes to Compute Cloud@Customer X11 systems that bring the same EPYC CPU-powered architecture into secure, on-premises environments.

The 5th Gen AMD EPYC family used in OCI’s latest compute offerings delivers up to 2× the performance of the prior generation at the same price point1, but the real story is how that performance is applied: enabling balanced, high-throughput AI systems where CPUs and GPUs work in lockstep. Whether in a hyperscale zettascale cluster or an air-gapped private deployment, EPYC processors help ensure that every other component in the rack operates at peak efficiency.

Customer Momentum: Cohere Scales Secure Enterprise AI with AMD and Oracle

Cohere, a leading enterprise AI company, is harnessing the power of Oracle Cloud Infrastructure and AMD Instinct™ accelerators to deliver secure, efficient, and scalable AI solutions to global businesses. The company’s deployments on OCI leverage AMD Instinct™ MI355X GPUs to accelerate training and inference across large language and multimodal models, driving faster innovation and cost-efficient performance for enterprise applications.

“Cohere is bringing enterprises the secure and efficient AI they need to address their most critical everyday challenges. Collaborating with AMD and Oracle helps us drive efficiencies and develop innovative technology as we deliver for our customers. We’re excited about the newest technologies from our partners, such as AMD Instinct MI355X GPUs and OCI’s Zettascale infrastructure.”
— Autumn Moulder, VP Engineering, Cohere

Cohere’s partnership underscores the momentum of enterprise customers adopting AMD and Oracle technologies to power next-generation AI workloads — from secure, high-performance training environments to scalable inference at global scale.

AMD and Oracle, Driving AI Innovation

Today at Oracle AI World, the industry can see firsthand how AMD and Oracle are working together to redefine what’s possible in AI infrastructure. With AMD Instinct MI355X GPUs, AMD Pensando Pollara networking, and AMD EPYC processors integrated into OCI’s open rack-scale architecture, we’re delivering a platform that is open, balanced, and built for the most demanding AI workloads on the planet.

This is not a vision for the future — it’s here now, available to Oracle customers starting today. Whether in the public cloud, at the edge, or in secure private environments, AMD and Oracle are enabling organizations to train faster, scale further, and innovate without limits.

Endnotes

  1. OCI launches high-performance E6 Standard compute instances powered by AMD: Comparative workload performance per core cost analysis for E5 and E6 shapes - https://blogs.oracle.com/cloud-infrastructure/post/oci-launches-highperformance-e6-standard-compute-instances-powered-by-amd
Share:

Article By