AMD EPYC 9004 – a small scale supercomputer?

For a long time the server and desktop space was dominated by intel CPUs. AMD lacked severely behind the competition with its processors but changed everything up, when they launched their ZEN architecture processors in February 2017. Suddenly they started to compete on the consumer side of processors again and started to bring some competition to the market. But in the server space intel was still reigning supreme, which was about to change.

AMDs ZEN architecture was designed to be scalable and they did scale it a lot over time. While their EPYC server grade processors started with up to 32 cores in 2017, they doubled this in late 2018 to up to 64 cores with ZEN2, improved performance with ZEN3 in late 2019 and finally scaled it up to 96 cores in November 2022 with ZEN4 (7)(8). With this massive scaling of CPU cores they managed to outperform Intel chips and produce incredibly powerful single system servers.

AMDs 9004 series EPYC server CPUs brought some massive improvements over their previous generation by increasing the core count by up to 50% form an already staggering 64 cores, to a total of 96 cores per CPU (1). Additionally, the frequency of all cores increased, while the ‘infinity fabric’ which allows cross communication between the single cores improved as well (1). Additionally, the new series of CPUs switched from PCIe 4.0 to PCIe 5.0 and DDR4 to DDR5 memory, for an huge increase in I/O and read/write performance (1). But even if this performance wasn’t enough already, EPYC allows you to link two of their flagship CPUs together in a single motherboard, to act like a single 192 core CPU (1).

What would such a possible max spec system look like?

2x AMD EPYC 9654 for 192 cores, 384 threads per system. 128 PCIe gen 5 lanes, plus 6 PCIe gen 3 lanes per CPU, with either 48 or 64 lanes used to connect the CPUs for up to 160 gen 5 lanes and 12 gen 3 lanes. 12 DDR5 memory controllers per CPU to support 6 TB of memory each for a total of 12 TB DDR5 memory in one system (1)(10).

With all of this, the system is theoretically able to offer 10.752 teraflops of double precision float operations (5), 1.280 terabyte/s of PCIe 5.0 I/O and up to 921.6 gigabyte/s of DDR5 ram I/O (10).

But what exactly do these numbers mean? If we compare them to some supercomputers, that once managed to hold the world records, we can see that the fastest supercomputer of 2000 was the ‘ASCII White’ of the ‘Lawrence Livermore National Laboratory’, being able to calculate 7.2 teraflops (6). The maximized AMD EPYC system would be able to outperform this system, without using any GPUs for parallelized computing, which in the case of an RTX 4090 would add about 1.14 to 1.29 teraflops per card used (9).

In 2002 the ‘ASCII White’ was beaten by the ‘Earth Simulator’ by ‘JAMSTEC Earth Simulator Center’ in Japan (6). It managed to calculate 35.68 teraflops. Only by this supercomputer, which used a bigger budget then ‘ASCII White’ to be constructed, is our theoretical system surpassed.

But what can such performance even be used for?

Scientist at CERN had to previously rely on huge data centers, to be able to store the collision data, which gets generated and collected near instantly (3). Before the data is filtered it arrives at a data rate of 40 terabyte per second from the sensors and has to be stored somehow. To alleviate this dependency on data centers they upgraded their processing for the data detectors to servers running AMD EPYC 7742 processors with 64 cores and 128 PCIe 4.0 lanes (3). These powered four 200Gbit Mellanox networking cards, which received 800Gbit of data per server. If these would be upgraded to the new top of the line EPYC systems, they would be able to handle 10 of these network cards per server for a staggering 2 terabit of I/O, cutting the number of servers needed by 3/5th (4). This would result in huge power and cost savings, while also reducing the complexity of the entire system and making it less error prone. Considering that the previous upgrade to the EPYC 7742 systems already reduced their amount of servers needed by 1/3rd it would overall reduce the amount to 1/5th.

If we consider this power and cost saving we have to roughly estimate how much power and parts cost such an system would bring. Since the main costs are in the processors and ram required the other parts of the machine will be roughly estimated. An AMD EPYC 9654 has a listing price of 11,805 USD (4). DDR5 ram in 256GB must be estimated at about 1,000 USD since kits in this size are nearly impossible to find but scale almost linearly with size in cost (12). With all of this in mind, the cost is about 1000 USD * 24 ram sticks + 2 * 11805 = 47,610 USD or about 50,000 USD considering other parts.

Power usage can be estimated by up to 400W per CPU (11) and about 600W for cooling and ram each (11), for a grand total of about 1400W base power usage.

If we compare this to the previously mentioned supercomputers, we can see some huge differences. The ‘ASCII White’ had a construction cost of about 110 million USD and a power usage of 6MW (6). The ‘Earth Simulator’ had a construction cost of about 600 million USD and a power usage of about 12.8 MW (6). Therefore, it is quite fair to say, that the scientific community can hugely profit by the cost reduction that these systems offer, especially considering power usage.

But an comparison to some supercomputers from 20 years ago, isn’t to fair either. Even with the upcoming ZEN4c architecture, which is announced to improve core count to 128 cores (2), these systems can’t hold up to modern supercomputers. Especially not if these computers use exactly these processors to reach new highs of performance (5). The ‘Frontier’ supercomputer, finished in 2022, manages to calculate a staggering 1.102 exaflops, or 100,000 times the performance of our dual CPU system, while also consuming 21 MW (6). To say such a system is a true supercomputer is an overstatement but considering the workloads it can calculate it truly isn’t an ordinary server either and thus offer newfound possibilities for huge monolithic systems.