AMD, Intel & NVIDIA Chips Power The NREL Kestrel Supercomputer: AMD EPYC Genoa, Intel Sapphire Rapids & NVIDIA H100
The US DOE’s National Renewable Energy Laboratory (NREL), announced its plans last year for Kestrel, a brand new supercomputer that will tap into the latest technologies from AMD, Intel & NVIDIA, to replace their existing Eagle supercomputer. The supercomputer will be designed by Hewlett Package Enterprise (HPE) and during a recent conference, they unveiled the first look at specs for the system. The Kestrel supercomputer will make use of standard and accelerated nodes. It will have a peak performance of 44 PetaFlops, a 5.5x increase over the existing Eagle system. For the standard nodes, Kestrel will utilize Intel’s latest Sapphire Rapids ‘Xeon Scalable’ CPUs. The particular config used by Kestrel is a 52 core and 104 thread SKU. There will be 2,304 standard nodes with each node running 2 processors in a dual-socket configuration. That’s a total of 4,608 Intel Sapphire Rapids-SP CPUs with a total of 239,616 cores and 479,232 threads. Data storage will consist of 75 PetaByers of Lustre storage and each of the 2304 nodes will be configured with 256 GB of DDR5 memory for a total of 560 Petabytes of system ram. Moving over to the accelerated nodes, Kestrel will deploy 132 of these configurations, each with four NVIDIA H100 GPU accelerators based on the Hopper graphics architecture and a dual-socket AMD EPYC Genoa CPU config. That’s 528 NVIDIA Hopper H100 GPUs and 264 AMD EPYC Genoa chips packed within these accelerated nodes. We don’t know the exact configurations of either the NVIDIA H100 GPUs or AMD EPYC Genoa CPUs used within Kestrel but if we take the top configuration, we get up to 8,921,088 CUDA cores (H100 SXM5) and a total of 25,344 Zen 4 cores (EPYC Genoa Up To 92 cores). The Accelerated node will also have 42 TB of HBM3 memory and 20 TB of system memory. There are also 8 DAV nodes which include up to 16 NVIDIA A40 GPUs. All of this will be connected using the HPE Slighshot Dragonfly interconnect. Some features of the HPE Slingshot interconnect include:
Industry-leading performance and scalability 100GbE and 200GbE interfaces High radix, 64-port, 12.8 Tb/s bandwidth switch Scalability to >250,000 host ports with a maximum of 3 hops Innovative hardware congestion management, adaptive routing, and quality of service Ethernet standards and protocols, plus optimized HPC functionality Link level retry and low-latency forward error correction Standardized, open API management interfaces
The Kestrel supercomputer is rated to deliver 10.4 gflops/watt which is nowhere close to the recently announced Frontier supercomputer with over 50 gflops/watt efficiency but that is also expensive and is a greater than Exaflop system. The Kestrel supercomputer is expected to be deployed by 2024. News Source: Momomo_US