Transforming AI Networks with AMD Pensando™ Pollara 400 AI NIC

Oct 10, 2024

The advent of generative AI and large language models (LLMs) has created unprecedented challenges for traditional Ethernet networks in AI clusters. These advanced AI/ML models demand intense communication capabilities, including tightly coupled parallel processing, rapid data transfers, and low-latency communication - requirements that conventional Ethernet, designed for general-purpose computing, has historically struggled to meet. Despite these challenges, Ethernet remains the preferred choice for network technology in AI clusters due to its widespread adoption and abundance of operational expertise. However, the limitations of traditional Ethernet in supporting specialized AI workloads have become increasingly apparent.

The AMD Pensando™ Pollara 400 AI NIC emerges as a significant advancement in AI networking, specifically designed to address these issues. Pollara 400 optimizes performance to meet the requirements of modern AI environments while allowing customers to leverage familiar Ethernet-based fabrics. The Pollara 400 effectively bridges the gap between Ethernet's broad compatibility and the specialized demands of AI workloads, providing a solution that combines the best of both worlds. By addressing the specific communication needs of AI/ML models, the Pollara 400 enables organizations to fully harness the potential of their AI workloads without sacrificing the benefits of Ethernet infrastructure. This innovative approach represents a crucial step forward in adapting networking technology to the evolving landscape of AI computing.

Agile and Efficient Distribution in an Open Environment

What is AMD Pensando™ Pollara 400 AI NIC?: The Pollara 400 is a specialized network accelerator explicitly designed to optimize data transfer within back-end AI networks for GPU-to-GPU communication. It delivers a fully programmable 400 Gigabit per second (Gbps) RDMA Ethernet Network Interface Card (NIC), enhancing the efficiency of AI workloads. Traditional DC Ethernet, which typically focuses on providing services such as user-access, segmentation, and multi-tenancy—to name just a few examples—often falls short when it comes to meeting the demanding requirements of modern AI workloads. These workloads demand high bandwidth, low latency, and efficient communication patterns not prioritized in conventional Ethernet designs. To address the challenges of AI workloads, a network that can support distributed computing over multiple GPU nodes, with low jitter and RDMA, is needed. The Pollara 400 is designed to manage the unique communication patterns of AI workloads and offer high throughput across all available links, along with congestion avoidance, reduced tail latency, scalable performance, and fastjob completion times. Additionally, it provides an open environment that doesn't limit customers to specific vendors, giving them more flexibility.

Standout Capabilities

P4 Programmability: The P4 programmable architecture allows the AMD Pensando™ Pollara 400 AI NIC to be versatile, enabling it to introduce innovations today while remaining adaptable to evolving standards in the future, such as those set by the Ultra Ethernet Consortium (UEC). This programmability ensures that the AMD Pensando™ Pollara 400 AI NIC can adapt to new protocols and requirements, future-proofing AI infrastructure investments. By leveraging P4, AMD enables customers to customize network behavior, implement bespoke RDMA transports, and optimize performance for specific AI workloads, all while maintaining compatibility with future industry standards.

Multipathing & Intelligent Packet Spraying: The AMD Pensando™ Pollara 400 AI NIC supports advanced adaptive packet spraying, which is crucial for managing AI models' high bandwidth and low latency requirements. This technology fully utilizes available bandwidth, particularly in CLOS fabric architectures, resulting in fast message completion times and lower tail latency. Pollara 400 integrates seamlessly with AMD Instinct™ Accelerator and AMD EPYC™ CPU infrastructure, providing reliable, high-speed connectivity for GPU-to-GPU RDMA communication. By intelligently spraying packets of a QP (Queue Pair) across multiple paths, it minimizes the chance of creating hot spots and congestion in AI networks, ensuring optimal performance. The Pollara 400 allows customers to choose their preferred Ethernet switching vendor, whether a lossy or lossless implementation. Importantly, the Pollara 400 drastically reduces network configuration and operational complexity by eliminating the requirement for a lossless network. This flexibility and efficiency make the Pollara 400 a powerful solution for enhancing AI workload performance and network reliability. 

In-Order Message Delivery: In standard RoCE v2, the RDMA Extended Transport Header (RETH)—which holds the 64-bit destination GPU-memory address and R_Key—appears only in the first packet of an RDMA WRITE. Because the remaining packets contain no address information, any packet that arrives out of sequence is treated as a bad packet and must be retransmitted, making packet-spraying or multi-pathing techniques impractical. The Pollara 400 with UEC-Ready RDMA solves this limitation by appending a complete RETH to every packet. Each packet is therefore self-describing: regardless of the path it takes or the order in which it arrives, the Pollara 400 can write the payload straight into the correct GPU-memory offset. Pollara streams every packet at line-rate, ensuring the GPU receives data in the precise sequence it was issued. This wire-speed, in-order delivery sustains maximum throughput and keeps tail latency consistently low, without burdening either the CPU or GPU with any reordering work.

Fast Loss Recovery with Selective Retransmission: The AMD Pensando™ Pollara 400 AI NIC enhances network performance through in-order message delivery and selective acknowledgment (SACK) retransmission. Unlike RoCEv2's Go-back-N mechanism, which resends all packets from the point of failure, SACK allows the Pollara 400 to identify and retransmit only lost or corrupted packets. This targeted approach optimizes bandwidth utilization, reduces latency in packet loss recovery, and minimizes redundant data transmission. By combining efficient in-order delivery with SACK retransmission, the AMD Pensando™ Pollara 400 AI NIC enables smooth data flow and optimal resource utilization. These features result in faster job completion times, lower tail latencies, and more efficient bandwidth use, making it ideal for demanding AI networks and large-scale machine learning operations.

Path Aware Congestion Control: The Pollara 400 employs real-time telemetry and network-aware algorithms to effectively manage network congestion, including in cast scenarios. Unlike RoCEv2, which relies on PFC and ECN in a lossless network, the AMD UEC ready RDMA transport offers a more sophisticated approach:

Maintains per-path congestion status
Dynamically avoids congested paths using adaptive packet-spraying
Sustains near wire-rate performance during transient congestion
Optimizes packet flow across multiple paths without requiring PFC
Implements per-flow congestion control to prevent interference between data flows

These features simplify configuration, reduce operational overhead, and avoid common issues like congestion spreading, deadlock, and head-of-line blocking. The path-aware congestion control enables deterministic performance across the network, crucial for large-scale AI operations. By intelligently handling congestion without a fully lossless network, AMD Pensando™ Pollara 400 AI NIC reduces network complexity, streamlining deployment in AI-driven data centers

Rapid Fault Detection in High-Performance AI Networks: High-performance networks are crucial for efficient data synchronization in AI GPU clusters. AMD Pensando™ Pollara 400 AI NIC employs sophisticated methods for rapid fault detection, essential for maintaining optimal performance. Standard protocols' timeout mechanisms are often too slow for AI applications, which require aggressive fault detection to address the critical factors of reducing idle GPU time and increasing throughput of AI training and inference tasks, ultimately decreasing job completion time.

AMD Pensando™ Pollara 400 AI NIC Rapid Fault Detection include Sender-Based ACK Monitoring, which leverages the sender's ability to track acknowledgments (ACKs) across multiple network paths.
AMD Pensando™ Pollara 400 AI NIC Receiver-Based Packet Monitoring is another technique that focuses on the receiver's perspective, monitoring incoming packet flows. The receiver tracks packet reception on each distinct network path, and a potential fault is identified if packets stop arriving on a path for a specified duration.
AMD Pensando™ Pollara 400 AI NIC Probe-Based Verification mechanism is employed upon suspicion of a fault (triggered by either of the above methods), a probe packet is transmitted on the suspected faulty path. If no response is received to the probe within a specified timeframe, the path is confirmed as failed. This additional step helps in distinguishing between transient network issues and actual path failures.

Rapid fault detection mechanisms offer significant advantages. By identifying issues in milliseconds, they enable near-instantaneous failover, minimizing GPU idle time. Swift detection and isolation of faulty paths optimize network resource allocation, ensuring uninterrupted AI workloads on healthy paths. This approach enhances overall AI performance, potentially reducing training times and improving inference accuracy.

Final Thoughts: The AMD Pensando™ Pollara 400 AI NIC is more than just a network card; it's a foundational component of a robust AI infrastructure. It addresses the limitations of traditional RoCEv2 Ethernet networks by offering features like real-time telemetry, adaptive packet spray with intelligent path aware congestion control to alleviate incast scenarios, selective acknowledgement, and robust error detection. AI workloads require networks that support bursty data flows, minimal jitter, noise isolation, and high bandwidth to ensure optimal GPU performance. When paired with "best of breed" standards-compliant Ethernet switches, AMD Pensando™ Pollara 400 AI NIC forms the backbone of a high-efficiency, low-latency AI cloud environment.

With its ability to deliver high throughput, low latency, and exceptional scalability, combined with the flexibility of P4 programmability, AMD Pensando™ Pollara 400 AI NIC is an essential tool in the arsenal of any AI cloud infrastructure. This programmable approach not only enhances the NIC's versatility but also allows for rapid deployment of new networking features, ensuring that AI infrastructures can evolve as quickly as the AI technologies they support.

Article By

Jason Gmitter

white pearl gradient medium color divider

Related Blogs

View All Blogs

資料中心

商用系統

個人與遊戲

嵌入式產品

資源

加速器

自適應加速器

DPU 加速器

乙太網配接器

工作站

桌上型電腦

筆記型電腦

資源

FPGA 與自適應 SoC

系統模組 (SOM)

技術

開發者資源

評估板與套件

處理器工具

顯示卡工具與應用程式

FPGA 與自適應 SoC 工具

IP 與應用

GPU 加速器工具與應用程式

概述

適用於資料中心與雲端

適用於邊緣與端點-

適用於開發者

行業

行業

行業

行業

Industrias

工作負載

遊戲

系統

技術

資源

EPYC 處理器

Radeon 顯示卡與 AMD 晶片組

FPGA 與自適應 SoC

Alveo 加速器與 Kria SOM

Ryzen 處理器

乙太網配接器

概述

EPYC 處理器

加速器

自適應 SoC、FPGA 和 SOM

顯示卡

概述

依產品排序資源

依類型排序資源

關於我們的合作夥伴

AMD 全球支援

處理器與顯示卡

加速器

FPGA 與自適應 SoC

遊戲與個人運算

自適應和嵌入式運算

Get AMD Fan Gear

Buy Direct From AMD

Buy Direct From AMD

Buy Direct From AMD

Buy Direct From AMD

Buy Direct From AMD

Transforming AI Networks with AMD Pensando™ Pollara 400 AI NIC

Article By

Related Blogs