Computing Resource
ABCI consists of; 120 Compute Nodes (A) that form in total 960 NVIDIA A100 GPU accelerators, 1,088 Compute Nodes (V) that form in total 4,352 NVIDIA GPU V100 accelerators, shared file systems and ABCI Cloud Storage that provide in total 47PB capacity, high-speed InfiniBand network connecting the compute nodes and the storage systems, firewall equipments, and etc.
ABCI System Outline
Features
Compute Node (A)
- Compute Node (A) has eight NVIDIA A100 GPU accelerators, two 3rd-Generation Intel Xeon Scalable Processors (namely Ice Lake), two NVMe SSDs, and four InfiniBand HDR (200Gbps each).
- The theoretical performance of Compute Node (A) is 2506 AI-TFLOPS for half precision (required for AI machine learning) and 161 TFLOPS for double precision (required for scientific and technical computations).
- The total theoretical performance of Compute Nodes (A) is 300 AI-PFLOPS (half precision) and 19 PFLOPS (double precision).
FUJITSU PRIMERGY GX2570 M6 (1 server in 4U)
CPU | Intel Xeon Platinum 8360Y Processor (54 MB Cache, 2.4 GHz, 36 Cores, 72 Threads) ×2 |
GPU | NVIDIA A100 for NVLink 40GiB HBM2 ×8 |
Memory | 512GiB DDR4 3200MHz RDIMM |
Local Storage | 2.0TB NVMe SSD (Intel SSD DC P4510 u.2) ×2 |
Interconnect | InfiniBand HDR (200Gbps) ×4 |
Compute Node (V)
- Compute Node (V) has four NVIDIA V100 GPU accelerators, two Intel Xeon Gold 6148, one NVMe SSD, 384GiB memory, two InfiniBand EDR ports (100Gbps each).
- The theoretical performance of the computing node is 506 AI-TFLOPS for half precision (required for AI machine learning) and 34.2 TFLOPS for double precision (required for scientific and technical computations).
- The total theoretical performance of Compute Nodes (V) is 550 AI-PFLOPS (half precision) and 37 PFLOPS (double precision).
FUJITSU PRIMERGY CX2570 M4 (2 servers in 2U)
CPU | Intel Xeon Gold 6148 Processor(27.5 MB L3 Cache, 2.40 GHz, 20 Cores, 40 Threads)×2 |
GPU | NVIDIA Tesla V100 SXM2 (16GiB HBM2)×4 |
Memory | 384GiB DDR4 2666MHz RDIMM |
Local Storage | 1.6TB NVMe SSD(Intel SSD DC P4600 u.2)×1 |
Interconnect | InfiniBand EDR (100Gbps)×2 |
Storage Systems
The ABCI system has five storage systems for storing large amounts of data used for AI and Big Data applications, and these are used to provide shared file systems and ABCI Cloud Storage. The shared file system is configured as a fast distributed file system using Luster and has an effective capacity of approximately 34PB. ABCI Cloud Storage is an object storage service with an Amazon Simple Storage Service (Amazon S3) compatible interface and has 17PB of physical capacity and 13PB of logical capacity available.
High-Speed Interconnect
Compute Node (A), Compute Node (V), shared file systems, and ABCI Cloud Storage are interconnected by high-speed InfiniBand network. Compute Nodes (A) can communicate with each other in full-bisection bandwidth. Compute Nodes (V) can communicate with the other nodes in the same rack, in full-bisection bandwidth.
Interconnection Network
Since ABCI is connected to SINET6(400Gbps), ABCI users may access ABCI through the internet. The connection is secured by firewalls (FortiGate 1500D) and two-stage authentication is adopted.