AMD has seen how official slides have been leaked where it is revealed that the AMD Radeon Instinct MI100, based on the CDNA @ 7nm architecture, is faster than Nvidia’s top-of-the-range GPU, the A100 (Ampere), in FP32 computation.
From the AMD Radeon Instinct MI100, we know that it will arrive during the second half of this year offering no less than 8192 Stream Processors that would finally offer a TDP of 300W, which is not bad when you consider that a Radeon RX Vega 64 or Radeon Instinct MI60 They have a TDP of 300W for 4096 Stream Processors. This is thanks to a new and efficient graphic architecture together with the 7nm manufacturing process.
The filtration revolves around a 1U Rack server that will be equipped with 2x AMD EPYC ROME (Zen2) or MILAN (Zen3) CPUs accompanied by 4x AMD Radeon Instinct MI100 that give an FP32 (SGEMM) performance of 136 TFLOPs, which gives us an average of 34 calculation TFLOPs FP32 per GPU. The information is completed by indicating a capacity of 128 GB of HBM2E memory (32GB per GPU) with a bandwidth of 4.9 TB / s (1.22 TB / s per GPU).
On the other hand, we have a 3U Rack, where the differences are to admit up to 8x AMD Radeon Instinct MI100 GPUs reaching 272 TFLOPs of power with 256 GB of HBM2E memory to add a bandwidth of 9.8 TB / s and consume about 3kW of energy.
To finish, we have a comparison that indicates that the AMD Radeon Instinct MI100 is 2.4x times higher than the Nvidia A100 in FP32 performance and all this costing 30 percent less. For its part, the Nvidia A100 is 2.5x more powerful in FP64 performance with an extra cost of 15%.
AMD Radeon Instinct Accelerators 2020
|Accelerator Name||AMD Radeon Instinct MI6||AMD Radeon Instinct MI8||AMD Radeon Instinct MI25||AMD Radeon Instinct MI50||AMD Radeon Instinct MI60||AMD Radeon Instinct MI100|
|GPU Architecture||Polaris 10||Fiji XT||Vega 10||Vega 20||Vega 20||Arcturus|
|GPU Process Node||14nm FinFET||28nm||14nm FinFET||7nm FinFET||7nm FinFET||7nm FinFET|
|GPU Clock Speed||1237 MHz||1000 MHz||1500 MHz||1725 MHz||1800 MHz||1334 MHz?|
|FP16 Compute||5.7 TFLOPs||8.2 TFLOPs||24.6 TFLOPs||26.5 TFLOPs||29.5 TFLOPs||~50 TFLOPs|
|FP32 Compute||5.7 TFLOPs||8.2 TFLOPs||12.3 TFLOPs||13.3 TFLOPs||14.7 TFLOPs||~25 TFLOPs|
|FP64 Compute||384 GFLOPs||512 GFLOPs||768 GFLOPs||6.6 TFLOPs||7.4 TFLOPs||~12.5 TFLOPs|
|VRAM||16 GB GDDR5||4 GB HBM1||16 GB HBM2||16 GB HBM2||32 GB HBM2||32 GB HBM2|
|Memory Clock||1750 MHz||500 MHz||945 MHz||1000 MHz||1000 MHz||TBD|
|Memory Bus||256-bit bus||4096-bit bus||2048-bit bus||4096-bit bus||4096-bit bus||4096-bit bus|
|Memory Bandwidth||224 GB/s||512 GB/s||484 GB/s||1 TB/s||1 TB/s||TBD|
|Form Factor||Single Slot, Full Length||Dual Slot, Half Length||Dual Slot, Full Length||Dual Slot, Full Length||Dual Slot, Full Length||Dual Slot, Full Length|
|Cooling||Passive Cooling||Passive Cooling||Passive Cooling||Passive Cooling||Passive Cooling||Passive Cooling?|
~200W (Test Board)