Reprogramming Discovery: How AMD Instinct™ GPUs Are Powering the Next Wave of AI-Driven Biology
May 19, 2025

Contributed by IPA Therapeutics
Part 1: Embeddings for NLP in Life Sciences
This article is the first in a three-part series benchmarking AMD Instinct™ MI300X GPUs against NVIDIA’s H100 GPUs across real-world AI workloads in drug discovery. The benchmarks were conducted by ImmunoPrecise Antibodies (IPA) and its AI subsidiary BioStrand, creators of the LENSai™ platform for AI-powered biologics discovery — in collaboration with Vultr, whose high-performance cloud infrastructure enabled rapid deployment and reproducibility across hardware configurations. Together, we evaluated how these GPUs perform under the practical demands of therapeutic development: from NLP-driven target discovery to generative protein design.
At the core of LENSai lies HYFT® technology — a biological fingerprinting system that encodes conserved sequence, structure, and function into a unified index. HYFTs were built to solve a fundamental limitation in AI: its lack of native understanding of biological systems. By embedding biological logic into the fabric of computation, HYFTs give AI models the context to reason through biology, not just compute it.
Over the following three articles, we’ll explore how the MI300X GPUs perform across the LENSai tech stack: NLP-driven literature mining, creation of protein embeddings for structure-function inference, and generative design through RFdiffusion.
Through real-world benchmarks in NLP, protein embeddings, and de novo protein design, we set out to evaluate raw performance, cost efficiency, and deployment viability for modern bioinformatics pipelines.
In this first installment, we focus on Natural Language Processing (NLP)—specifically, how large language models and Retrieval-Augmented Generation (RAG) accelerate early-stage therapeutic discovery by extracting actionable insights from scientific literature. The key takeaway? AMD GPUs are not only competitive in speed but also offer substantial cost advantages—a critical factor for life science organizations scaling AI-driven platforms.
Natural Language Processing (NLP) significantly enhances therapeutic innovations by mining vast textual information effectively. NLP helps unlock latent insights from scientific literature, clinical reports, and molecular databases. NLP-driven large language models (LLMs) streamline the analysis and prediction processes essential for drug discovery, aligning with the FDA’s shift towards computational models, emphasizing safety, efficacy, and cost-efficiency.
Vector embeddings in RAG (Retrieval-Augmented Generation) systems enable knowledge-aware models to surface relevant insights based on semantics rather than phrasing. These embeddings aren’t limited to text; they support biological sequences and structures as well, enabling NLP to bridge silos in life sciences.
LENSai builds on today’s vector search capabilities and takes it further —adding a powerful semantic layer that detects sub-sentence units and extracts subject–predicate–object triples to uncover meaningful biological relationships. By capturing how targets, pathways, and compounds interact at a mechanistic level, LENSai empowers researchers to identify therapeutic targets, map disease pathways, and anticipate drug behavior with greater clarity. This depth of insight—often buried in unstructured biomedical data—can be surfaced and acted on long before wet lab experiments begin, accelerating discovery while reducing cost and risk.
Infrastructure Context
We deployed both NVIDIA H100 and AMD Instinct MI300X GPUs in a flexible, cloud-native environment, ensuring reproducible benchmarks and fair comparisons across hardware generations.
GPU Specification |
AMD Instinct™ GPU |
NVIDIA H100 |
Memory Capacity |
192 GB |
80 GB |
GPU Architecture |
CDNA 3 |
Hopper |
Compute Power |
FP64/FP32/FP16 |
FP64/FP32/FP16 |
Deployment Model |
Cloud-native |
Cloud-native |
NLP Benchmark Results
Our retrieval-augmented generation (RAG) systems use vector embeddings of literature for contextually relevant insights. AMD demonstrated superior throughput and cost-efficiency:
Metric |
NVIDIA H100 |
AMD Instinct™ MI300X |
Sequences/sec |
2741.21 |
3421.22 |
Cost per 1M Samples |
$2.40 |
$1.46 |
The MI300X also exhibited enhanced stability under high concurrency workloads.
Technical Implementation: Seamless Transition to AMD GPUs
Transitioning NLP tasks to AMD GPUs via ROCm PyTorch Docker images is straightforward:
FROM rocm/pytorch:rocm6.3.1_ubuntu22.04_py3.10_pytorch
No changes are required in the Python code—PyTorch's device abstraction (torch.device("cuda")) ensures compatibility.
These NLP benchmarks illustrate how AMD Instinct™ MI300X GPUs delivers both technical and economic value in one of the most fundamental layers of AI-assisted drug discovery.
In Part 2 of our next blog, we’ll move deeper into the biological stack, exploring how protein language models and biological embeddings reshape how we understand sequences, mutations, and functional relevance in drug development.
IPA (ImmunoPrecise Antibodies NASDAQ: IPA ) is a biotherapeutic research company that brings industry leading antibody discovery services and complex artificial intelligence technologies together — to lead its pharmaceutical partners into the era of the antibody.
Vultr is led by veterans of the managed hosting business, taking 20+ years of experience in complex hosting environments and made it their mission to simplify the cloud.
