AMD’s newly announced Instinct MI355X accelerators will be the computational backbone of upcoming Department of Energy (DOE) supercomputing systems aimed at advancing nuclear materials research and AI-assisted scientific modeling. In partnership with Hewlett Packard Enterprise (HPE), these systems are expected to push the boundaries of exascale computing and energy-efficient AI workloads.
DOE Taps AMD-HPE Stack for Next-Gen Scientific Computing
The US Department of Energy has selected AMD’s latest MI355X GPUs to power a new wave of high-performance computing (HPC) platforms developed by Hewlett Packard Enterprise. These systems will be deployed at national laboratories including Oak Ridge National Laboratory (ORNL) and Argonne National Laboratory (ANL), where they will support advanced research in nuclear materials science, fusion energy modeling, climate simulation, and AI-driven physics.
The announcement comes as part of a broader DOE initiative to modernize its HPC infrastructure beyond existing exascale systems like Frontier (ORNL) and Aurora (ANL). The new systems—whose names have not yet been disclosed—will serve as successors or complements to current platforms such as Polaris at ANL and Frontier at ORNL.
MI355X: AMD’s Answer to NVIDIA in Scientific AI Compute
Unveiled during AMD’s Data Center & AI Technology Premiere in June 2024, the Instinct MI355X is based on the CDNA3 architecture. It features second-generation Matrix Cores optimized for large-scale matrix operations critical in scientific simulations and transformer-based AI models. With support for FP64 compute precision—a key requirement for physics-based modeling—the MI355X targets workloads that demand both traditional HPC performance and emerging generative AI capabilities.
- Architecture: CDNA3
- Compute Precision: FP64/FP32/BF16/INT8
- Interconnect: Infinity Fabric with PCIe Gen5
- TDP: Estimated around 600W (based on comparable MI300 series)
- Memory: High Bandwidth Memory (HBM3), up to several TB/s bandwidth
The MI355X is designed to compete directly with NVIDIA’s H100/H200 Tensor Core GPUs in both performance-per-watt and memory bandwidth—two critical metrics in large-scale scientific computing environments where thermal envelopes are tightly constrained.
HPE Cray Systems Remain Central to DOE Infrastructure
The new supercomputers will be built on HPE’s Cray EX platform—the same architecture that underpins Frontier. This system integrates Slingshot interconnects optimized for low-latency communication across tens of thousands of nodes. Combined with AMD CPUs—likely EPYC Genoa or Bergamo series—and the new MI355X GPUs, these machines are expected to deliver multi-exaflop performance with improved energy efficiency compared to current generation systems.
This alignment continues HPE’s dominance in DOE contracts following its acquisition of Cray Inc. in 2019. The company has since delivered multiple record-setting supercomputers including Frontier (1.1 exaflops peak), which remains one of the fastest publicly known machines globally.
Nuclear Materials Science Among Key Use Cases
The primary mission sets for these new AMD-powered systems include simulation-heavy domains such as:
- Nuclear reactor material degradation modeling
- Fusion plasma containment simulations
- Molecular dynamics for radiation exposure studies
- AI-assisted discovery of novel alloys or composites
- Lattice QCD simulations relevant to particle physics
The use of high-precision FP64 compute alongside transformer-style neural networks allows researchers to fuse data-driven insights with first-principles physics models—a hybrid approach increasingly favored by national labs seeking faster iteration cycles without sacrificing accuracy.
A Strategic Win Against NVIDIA Amid Growing HPC Competition
This contract marks a significant strategic win for AMD over rival NVIDIA in the high-end HPC domain. While NVIDIA dominates commercial AI data centers with its CUDA ecosystem and software stack like cuDNN and TensorRT, AMD is gaining ground in government-funded scientific computing where open standards like ROCm are increasingly adopted due to transparency concerns around proprietary platforms.
The DOE has historically favored open architectures that can be audited and customized by lab researchers—a factor that likely played into selecting AMD over NVIDIA or Intel Gaudi accelerators for this tranche of deployments.